Git Repo - linux.git/log

vhost: remove vhost_work_queue

vhost_work_queue is no longer used. Each driver is using the poll or vq
based queueing, so remove vhost_work_queue.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost_scsi: flush IO vqs then send TMF rsp

With one worker we will always send the scsi cmd responses then send the
TMF rsp, because LIO will always complete the scsi cmds first then call
into us to send the TMF response.

With multiple workers, the IO vq workers could be running while the
TMF/ctl vq worker is running so this has us do a flush before completing
the TMF to make sure cmds are completed when it's work is later queued
and run.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost_scsi: convert to vhost_vq_work_queue

Convert from vhost_work_queue to vhost_vq_work_queue so we can
remove vhost_work_queue.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost_scsi: make SCSI cmd completion per vq

This patch separates the scsi cmd completion code paths so we can complete
cmds based on their vq instead of having all cmds complete on the same
worker/CPU. This will be useful with the next patches that allow us to
create mulitple worker threads and bind them to different vqs, and we can
have completions running on different threads/CPUs.

Signed-off-by: Mike Christie <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost_sock: convert to vhost_vq_work_queue

Convert from vhost_work_queue to vhost_vq_work_queue, so we can drop
vhost_work_queue.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: convert poll work to be vq based

This has the drivers pass in their poll to vq mapping and then converts
the core poll code to use the vq based helpers. In the next patches we
will allow vqs to be handled by different workers, so to allow drivers
to execute operations like queue, stop, flush, etc on specific polls/vqs
we need to know the mappings.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: take worker or vq for flushing

This patch has the core work flush function take a worker. When we
support multiple workers we can then flush each worker during device
removal, stoppage, etc. It also adds a helper to flush specific
virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: take worker or vq instead of dev for queueing

This patch has the core work queueing function take a worker for when we
support multiple workers. It also adds a helper that takes a vq during
queueing so modules can control which vq/worker to queue work on.

This temp leaves vhost_work_queue. It will be removed when the drivers
are converted in the next patches.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost, vhost_net: add helper to check if vq has work

In the next patches each vq might have different workers so one could
have work but others do not. For net, we only want to check specific vqs,
so this adds a helper to check if a vq has work pending and converts
vhost-net to use it.

Signed-off-by: Mike Christie <[email protected]>
Acked-by: Jason Wang <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: add vhost_worker pointer to vhost_virtqueue

This patchset allows userspace to map vqs to different workers. This
patch adds a worker pointer to the vq so in later patches in this set
we can queue/flush specific vqs and their workers.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: dynamically allocate vhost_worker

This patchset allows us to allocate multiple workers, so this has us
move from the vhost_worker that's embedded in the vhost_dev to
dynamically allocating it.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vhost: create worker at end of vhost_dev_set_owner

vsock can start queueing work after VHOST_VSOCK_SET_GUEST_CID, so
after we have called vhost_worker_create it can be calling
vhost_work_queue and trying to access the vhost worker/task. If
vhost_dev_alloc_iovecs fails, then vhost_worker_free could free
the worker/task from under vsock.

This moves vhost_worker_create to the end of vhost_dev_set_owner
where we know we can no longer fail in that path. If it fails
after the VHOST_SET_OWNER and userspace closes the device, then
the normal vsock release handling will do the right thing.

Signed-off-by: Mike Christie <[email protected]>
Message-Id: <20230626232307 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio_bt: call scheduler when we free unused buffs

For virtio-net we were getting CPU stall warnings, and fixed it by
calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall
when free_unused_bufs").

This driver is similar so theoretically the same logic applies.

Signed-off-by: Xianting Tian <[email protected]>
Message-Id: <20230609131817 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-console: call scheduler when we free unused buffs

For virtio-net we were getting CPU stall warnings, and fixed it by
calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall
when free_unused_bufs").

This driver is similar so theoretically the same logic applies.

Signed-off-by: Xianting Tian <[email protected]>
Message-Id: <20230609131817 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

virtio-crypto: call scheduler when we free unused buffs

For virtio-net we were getting CPU stall warnings, and fixed it by
calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall
when free_unused_bufs").

This driver is similar so theoretically the same logic applies.

Signed-off-by: Xianting Tian <[email protected]>
Message-Id: <20230609131817 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vDPA/ifcvf: implement new accessors for vq_state

This commit implements a better layout of the
live migration bar, therefore the accessors for virtqueue
state have been refactored.

This commit also add a comment to the probing-ids list,
indicating this driver drives F2000X-PL virtio-net

Signed-off-by: Zhu Lingshan <[email protected]>
Message-Id: <20230612151420.1019504 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vDPA/ifcvf: detect and report max allowed vq size

Rather than a hardcode, this commit detects
and reports the max value of allowed size
of the virtqueues

Signed-off-by: Zhu Lingshan <[email protected]>
Message-Id: <20230612151420.1019504 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

vDPA/ifcvf: dynamic allocate vq data stores

This commit dynamically allocates the data
stores for the virtqueues based on
virtio_pci_common_cfg.num_queues.

Signed-off-by: Zhu Lingshan <[email protected]>
Message-Id: <20230612151420.1019504 [email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>

ovl: move all parameter handling into params.{c,h}

While initially I thought that we couldn't move all new mount api
handling into params.{c,h} it turns out it is possible. So this just
moves a good chunk of code out of super.c and into params.{c,h}.

Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>

kdb: move kdb_send_sig() declaration to a better header file

kdb_send_sig() is defined in the signal code and called from kdb,
but the declaration is part of the kdb internal code.
Move the declaration to the shared header to avoid the warning:

kernel/signal.c:4789:6: error: no previous prototype for 'kdb_send_sig' [-Werror=missing-prototypes]

Reported-by: Arnd Bergmann <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Daniel Thompson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Documentation: ABI: sysfs-class-net-qmi: pass_through contact update

Switch to the quicinc.com id.

Fixes: bd1af6b5fffd ("Documentation: ABI: sysfs-class-net-qmi: document pass-through file")
Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

tcp: annotate data races in __tcp_oow_rate_limited()

request sockets are lockless, __tcp_oow_rate_limited() could be called
on the same object from different cpus. This is harmless.

Add READ_ONCE()/WRITE_ONCE() annotations to avoid a KCSAN report.

Fixes: 4ce7e93cb3fe ("tcp: rate limit ACK sent by SYN_RECV request sockets")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'wireguard-fixes'

Jason A. Donenfeld says:

====================
wireguard fixes for 6.4.2/6.5-rc1

Sorry to send these patches during the merge window, but they're net
fixes, not netdev enhancements, and while I'd ordinarily wait anyway,
I just got a first bug report for one of these fixes, which I originally
had thought was mostly unlikely. So please apply the following three
patches to net:

1) Make proper use of nr_cpu_ids with cpumask_next(), rather than
   awkwardly using modulo, to handle dynamic CPU topology changes.
   Linus noticed this a while ago and pointed it out, and today a user
   actually got hit by it.

2) Respect persistent keepalive and other staged packets when setting
   the private key after the interface is already up.

3) Use timer_delete_sync() instead of del_timer_sync(), per the
   documentation.
====================

Signed-off-by: David S. Miller <[email protected]>

wireguard: timers: move to using timer_delete_sync

The documentation says that del_timer_sync is obsolete, and code should
use the equivalent timer_delete_sync instead, so switch to it.

Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

wireguard: netlink: send staged packets when setting initial private key

Packets bound for peers can queue up prior to the device private key
being set. For example, if persistent keepalive is set, a packet is
queued up to be sent as soon as the device comes up. However, if the
private key hasn't been set yet, the handshake message never sends, and
no timer is armed to retry, since that would be pointless.

But, if a user later sets a private key, the expectation is that those
queued packets, such as a persistent keepalive, are actually sent. So
adjust the configuration logic to account for this edge case, and add a
test case to make sure this works.

Maxim noticed this with a wg-quick(8) config to the tune of:

    [Interface]
    PostUp = wg set %i private-key somefile

    [Peer]
    PublicKey = ...
    Endpoint = ...
    PersistentKeepalive = 25

Here, the private key gets set after the device comes up using a PostUp
script, triggering the bug.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: [email protected]
Reported-by: Maxim Cournoyer <[email protected]>
Tested-by: Maxim Cournoyer <[email protected]>
Link: https://lore.kernel.org/wireguard/[email protected]/
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

wireguard: queueing: use saner cpu selection wrapping

Using `% nr_cpumask_bits` is slow and complicated, and not totally
robust toward dynamic changes to CPU topologies. Rather than storing the
next CPU in the round-robin, just store the last one, and also return
that value. This simplifies the loop drastically into a much more common
pattern.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Cc: [email protected]
Reported-by: Linus Torvalds <[email protected]>
Tested-by: Manuel Leiner <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

samples: pktgen: fix append mode failed issue

Each sample script sources functions.sh before parameters.sh
which makes $APPEND undefined when trapping EXIT no matter in
append mode or not. Due to this when sample scripts finished
they always do "pgctrl reset" which resets pktgen config.

So move trap to each script after sourcing parameters.sh
and trap EXIT explicitly.

Signed-off-by: J.J. Martzki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

selftests/net: Add xt_policy config for xfrm_policy test

When running Kselftests with the current selftests/net/config
the following problem can be seen with the net:xfrm_policy.sh
selftest:

  # selftests: net: xfrm_policy.sh
  [   41.076721] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
  [   41.094787] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
  [   41.107635] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
  # modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
  # iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
  # Perhaps iptables or your kernel needs to be upgraded.
  # modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
  # iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
  # Perhaps iptables or your kernel needs to be upgraded.
  # SKIP: Could not insert iptables rule
  ok 1 selftests: net: xfrm_policy.sh # SKIP

This is because IPsec "policy" match support is not available
to the kernel.

This patch adds CONFIG_NETFILTER_XT_MATCH_POLICY as a module
to the selftests/net/config file, so that `make
kselftest-merge` can take this into consideration.

Signed-off-by: Daniel Díaz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: fix net_dev_start_xmit trace event vs skb_transport_offset()

After blamed commit, we must be more careful about using
skb_transport_offset(), as reminded us by syzbot:

WARNING: CPU: 0 PID: 10 at include/linux/skbuff.h:2868 skb_transport_offset include/linux/skbuff.h:2977 [inline]
WARNING: CPU: 0 PID: 10 at include/linux/skbuff.h:2868 perf_trace_net_dev_start_xmit+0x89a/0xce0 include/trace/events/net.h:14
Modules linked in:
CPU: 0 PID: 10 Comm: kworker/u4:1 Not tainted 6.1.30-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet
RIP: 0010:skb_transport_header include/linux/skbuff.h:2868 [inline]
RIP: 0010:skb_transport_offset include/linux/skbuff.h:2977 [inline]
RIP: 0010:perf_trace_net_dev_start_xmit+0x89a/0xce0 include/trace/events/net.h:14
Code: 8b 04 25 28 00 00 00 48 3b 84 24 c0 00 00 00 0f 85 4e 04 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc e8 56 22 01 fd <0f> 0b e9 f6 fc ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 86 f9 ff
RSP: 0018:ffffc900002bf700 EFLAGS: 00010293
RAX: ffffffff8485d8ca RBX: 000000000000ffff RCX: ffff888100914280
RDX: 0000000000000000 RSI: 000000000000ffff RDI: 000000000000ffff
RBP: ffffc900002bf818 R08: ffffffff8485d5b6 R09: fffffbfff0f8fb5e
R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff110217d8f67
R13: ffff88810bec7b3a R14: dffffc0000000000 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f96cf6d52f0 CR3: 000000012224c000 CR4: 0000000000350ef0
Call Trace:
<TASK>
[<ffffffff84715e35>] trace_net_dev_start_xmit include/trace/events/net.h:14 [inline]
[<ffffffff84715e35>] xmit_one net/core/dev.c:3643 [inline]
[<ffffffff84715e35>] dev_hard_start_xmit+0x705/0x980 net/core/dev.c:3660
[<ffffffff8471a232>] __dev_queue_xmit+0x16b2/0x3370 net/core/dev.c:4324
[<ffffffff85416493>] dev_queue_xmit include/linux/netdevice.h:3030 [inline]
[<ffffffff85416493>] batadv_send_skb_packet+0x3f3/0x680 net/batman-adv/send.c:108
[<ffffffff85416744>] batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127
[<ffffffff853bc52a>] batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline]
[<ffffffff853bc52a>] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline]
[<ffffffff853bc52a>] batadv_iv_send_outstanding_bat_ogm_packet+0x69a/0x840 net/batman-adv/bat_iv_ogm.c:1701
[<ffffffff8151023c>] process_one_work+0x8ac/0x1170 kernel/workqueue.c:2289
[<ffffffff81511938>] worker_thread+0xaa8/0x12d0 kernel/workqueue.c:2436

Fixes: 66e4c8d95008 ("net: warn if transport header was not set")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge mode

There was a regression introduced by the blamed commit, where pinging to
a VLAN-unaware bridge would fail with the repeated message "Couldn't
decode source port" coming from the tagging protocol driver.

When receiving packets with a bridge_vid as determined by
dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode:
- source_port = 0 (which isn't really valid, more like "don't know")
- switch_id = 0 (which isn't really valid, more like "don't know")
- vbid = value in range 1-7

Since the blamed patch has reversed the order of the checks, we are now
going to believe that source_port != -1 and switch_id != -1, so they're
valid, but they aren't.

The minimal solution to the problem is to only populate source_port and
switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero,
i.e. the source port information is trustworthy.

Fixes: c1ae02d87689 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode

According to the synchronization rules for .ndo_get_stats() as seen in
Documentation/networking/netdevices.rst, acquiring a plain spin_lock()
should not be illegal, but the bridge driver implementation makes it so.

After running these commands, I am being faced with the following
lockdep splat:

$ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up
$ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set macsec0 master br0 && ip link set macsec0 up

  ========================================================
  WARNING: possible irq lock inversion dependency detected
  6.4.0-04295-g31b577b4bd4a #603 Not tainted
  --------------------------------------------------------
  swapper/1/0 just changed the state of lock:
  ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198
  but this lock took another, SOFTIRQ-unsafe lock in the past:
   (&ocelot->stats_lock){+.+.}-{3:3}

  and interrupts could create inverse lock ordering between them.

  other info that might help us debug this:
  Chain exists of:
    &br->lock --> &br->hash_lock --> &ocelot->stats_lock

   Possible interrupt unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&ocelot->stats_lock);
                                 local_irq_disable();
                                 lock(&br->lock);
                                 lock(&br->hash_lock);
    <Interrupt>
      lock(&br->lock);

   *** DEADLOCK ***

(details about the 3 locks skipped)

swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this
only matters to the extent that its .ndo_get_stats64() method calls
spin_lock(&ocelot->stats_lock).

Documentation/locking/lockdep-design.rst says:

| A lock is irq-safe means it was ever used in an irq context, while a lock
| is irq-unsafe means it was ever acquired with irq enabled.

(...)

| Furthermore, the following usage based lock dependencies are not allowed
| between any two lock-classes::
|
|    <hardirq-safe>   ->  <hardirq-unsafe>
|    <softirq-safe>   ->  <softirq-unsafe>

Lockdep marks br->hash_lock as softirq-safe, because it is sometimes
taken in softirq context (for example br_fdb_update() which runs in
NET_RX softirq), and when it's not in softirq context it blocks softirqs
by using spin_lock_bh().

Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never
blocks softirqs from running, and it is never taken from softirq
context. So it can always be interrupted by softirqs.

There is a call path through which a function that holds br->hash_lock:
fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock:
ocelot_port_get_stats64(). This can be seen below:

ocelot_port_get_stats64+0x3c/0x1e0
felix_get_stats64+0x20/0x38
dsa_slave_get_stats64+0x3c/0x60
dev_get_stats+0x74/0x2c8
rtnl_fill_stats+0x4c/0x150
rtnl_fill_ifinfo+0x5cc/0x7b8
rtmsg_ifinfo_build_skb+0xe4/0x150
rtmsg_ifinfo+0x5c/0xb0
__dev_notify_flags+0x58/0x200
__dev_set_promiscuity+0xa0/0x1f8
dev_set_promiscuity+0x30/0x70
macsec_dev_change_rx_flags+0x68/0x88
__dev_set_promiscuity+0x1a8/0x1f8
__dev_set_rx_mode+0x74/0xa8
dev_uc_add+0x74/0xa0
fdb_add_hw_addr+0x68/0xd8
fdb_add_local+0xc4/0x110
br_fdb_add_local+0x54/0x88
br_add_if+0x338/0x4a0
br_add_slave+0x20/0x38
do_setlink+0x3a4/0xcb8
rtnl_newlink+0x758/0x9d0
rtnetlink_rcv_msg+0x2f0/0x550
netlink_rcv_skb+0x128/0x148
rtnetlink_rcv+0x24/0x38

the plain English explanation for it is:

The macsec0 bridge port is created without p->flags & BR_PROMISC,
because it is what br_manage_promisc() decides for a VLAN filtering
bridge with a single auto port.

As part of the br_add_if() procedure, br_fdb_add_local() is called for
the MAC address of the device, and this results in a call to
dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken.

Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up
calling __dev_set_promiscuity() for macsec0, which is propagated by its
implementation, macsec_dev_change_rx_flags(), to the lower device: swp0.
This triggers the call path:

dev_set_promiscuity(swp0)
-> rtmsg_ifinfo()
   -> dev_get_stats()
      -> ocelot_port_get_stats64()

with a calling context that lockdep doesn't like (br->hash_lock held).

Normally we don't see this, because even though many drivers that can be
bridge ports don't support IFF_UNICAST_FLT, we need a driver that

(a) doesn't support IFF_UNICAST_FLT, *and*
(b) it forwards the IFF_PROMISC flag to another driver, and
(c) *that* driver implements ndo_get_stats64() using a softirq-unsafe
    spinlock.

Condition (b) is necessary because the first __dev_set_rx_mode() calls
__dev_set_promiscuity() with "bool notify=false", and thus, the
rtmsg_ifinfo() code path won't be entered.

The same criteria also hold true for DSA switches which don't report
IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its
ndo_get_stats64() method, the same lockdep splat can be seen.

I think the deadlock possibility is real, even though I didn't reproduce
it, and I'm thinking of the following situation to support that claim:

fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally
disabled and br->hash_lock held, and may end up attempting to acquire
ocelot->stats_lock.

In parallel, ocelot->stats_lock is currently held by a thread B (say,
ocelot_check_stats_work()), which is interrupted while holding it by a
softirq which attempts to lock br->hash_lock.

Thread B cannot make progress because br->hash_lock is held by A. Whereas
thread A cannot make progress because ocelot->stats_lock is held by B.

When taking the issue at face value, the bridge can avoid that problem
by simply making the ports promiscuous from a code path with a saner
calling context (br->hash_lock not held). A bridge port without
IFF_UNICAST_FLT is going to become promiscuous as soon as we call
dev_uc_add() on it (which we do unconditionally), so why not be
preemptive and make it promiscuous right from the beginning, so as to
not be taken by surprise.

With this, we've broken the links between code that holds br->hash_lock
or br->lock and code that calls into the ndo_change_rx_flags() or
ndo_get_stats64() ops of the bridge port.

Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'iomap-6.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap updates from Darrick Wong:

- Fix a type signature mismatch

- Drop Christoph as maintainer

* tag 'iomap-6.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: drop me [hch] from MAINTAINERS for iomap
fs: iomap: Change the type of blocksize from 'int' to 'unsigned int' in iomap_file_buffered_write_punch_delalloc

Merge tag 'v6.5/vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fix from Christian Brauner:
"A fix for the backing file work from this cycle.

  When init_file() failed it would call file_free_rcu() on the file
  allocated by the caller of init_file(). It naively assumed that the
  correct cleanup operation would be called depending on whether it is a
  regular file or a backing file. However, that presupposes that the
  FMODE_BACKING flag would already be set which it won't be as that is
  done in the caller of init_file().

  Fix that bug by moving the cleanup of the allocated file into the
  caller where it belongs in the first place. There's no good reason for
  init_file() to consume resources it didn't allocate. This is a
  mainline only fix and was reported by syzbot. The fix was validated by
  syzbot against the provided reproducer"

* tag 'v6.5/vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: move cleanup from init_file() into its callers

Merge tag 'i2c-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c updates from Wolfram Sang:

- I2C has now a co-maintainer taking care of the host drivers. Welcome
   Andi Shyti and have fun!

- platform remove callback converted to return void in drivers

- simplify drivers by using devm_clk_get_enabled()

- introduce i2c_get_match_data() to avoid more boilerplate code
   (especially since the core stopped delivering an i2c_device_id)

- and the usual bunch of driver updates

* tag 'i2c-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
  i2c: uniphier: Use devm_clk_get_enabled()
  i2c: uniphier-f: Use devm_clk_get_enabled()
  i2c: owl: Use devm_clk_get_enabled()
  i2c: lpc2k: Use devm_clk_get_enabled()
  i2c: hix5hd2: Use devm_clk_get_enabled()
  i2c: sun6i-p2wi: Use devm_clk_get_enabled()
  i2c: pasemi-platform: Use devm_clk_get_enabled()
  i2c: mt7621: Use devm_clk_get_enabled()
  i2c: xiic: Use devm_clk_get_enabled()
  i2c: davinci: Use platform table macro over module_alias
  i2c: ocores: use devm_ managed clks
  i2c: nomadik: Use dev_err_probe() whenever possible
  i2c: nomadik: Use devm_clk_get_enabled()
  i2c: nomadik: Remove unnecessary goto label
  usb: typec: ucsi: Mark dGPUs as DEVICE scope
  i2c: wmt: Use devm_platform_get_and_ioremap_resource()
  i2c: versatile: Use devm_platform_get_and_ioremap_resource()
  i2c: hix5hd2: Add I2C_M_STOP flag support for i2c-hix5hd2 driver.
  i2c: mpc: Use of_property_read_reg() to parse "reg"
  i2c: imx-lpi2c: Don't open-code DIV_ROUND_UP
  ...

Merge tag 'parisc-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:

- Add missing cacheflush() syscall

- Fix STI console on 64-bit-only machines

- Move kernel debug options to Kconfig.debug

- Lots of warning fixes in arch/parisc/ and drivers/parisc/ when
   compiled with W=1

- Enable some more graphics drivers in refreshed defconfigs

* tag 'parisc-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (29 commits)
  parisc: Refresh defconfigs
  parisc: irq: Add irq-related function declarations
  parisc: Move init function declarations into header file
  parisc: dino: Make dino_init() returning void
  parisc: lba_pci: Mark two variables __maybe_unused
  parisc: unaligned: Include header file to avoid missing prototype warnings
  parisc: signal: Mark do_notify_resume() and sys_rt_sigreturn() asmlinkage
  parisc: unwind: Mark start and stop variables __maybe_unused
  parisc: init: Drop unused variable end_paddr
  parisc: traps: Mark functions static
  parisc: processor: Fix kdoc for init_cpu_profiler()
  parisc: sys_parisc: parisc_personality() is called from asm code
  parisc: ccio-dma: Fix kdoc and compiler warnings
  parisc: pdc_stable: Fix kdoc and compiler warnings
  parisc: pci-dma: Make pcxl_alloc_range() static
  parisc: Mark image_size __maybe_unused in perf_write()
  parisc: module: Mark symindex __maybe_unused
  parisc: pdc_chassis: Fix kdoc warnings
  parisc: firmware: Fix kdoc warnings
  parisc: drivers: Fix kdoc warnings
  ...

xfs: fix the calculation for "end" and "length"

The value of "end" should be "start + length - 1".

Signed-off-by: Shiyang Ruan <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>

xfs: fix xfs_btree_query_range callers to initialize btree rec fully

Use struct initializers to ensure that the xfs_btree_irecs passed into
the query_range function are completely initialized. No functional
changes, just closing some sloppy hygiene.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

xfs: validate fsmap offsets specified in the query keys

Improve the validation of the fsmap offset fields in the query keys and
move the validation to the top of the function now that we have pushed
the low key adjustment code downwards.

Also fix some indenting issues that aren't worth a separate patch.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

xfs: fix logdev fsmap query result filtering

The external log device fsmap backend doesn't have an rmapbt to query,
so it's wasteful to spend time initializing the rmap_irec objects.
Worse yet, the log could (someday) be longer than 2^32 fsblocks, so
using the rmap irec structure will result in integer overflows.

Fix this mess by computing the start address that we want from keys[0]
directly, and use the daddr-based record filtering algorithm that we
also use for rtbitmap queries.

Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

xfs: clean up the rtbitmap fsmap backend

The rtbitmap fsmap backend doesn't query the rmapbt, so it's wasteful to
spend time initializing the rmap_irec objects. Worse yet, the logic to
query the rtbitmap is spread across three separate functions, which is
unnecessarily difficult to follow.

Compute the start rtextent that we want from keys[0] directly and
combine the functions to avoid passing parameters around everywhere, and
consolidate all the logic into a single function. At one point many
years ago I intended to use __xfs_getfsmap_rtdev as the launching point
for realtime rmapbt queries, but this hasn't been the case for a long
time.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

xfs: fix getfsmap reporting past the last rt extent

The realtime section ends at the last rt extent. If the user configures
the rt geometry with an extent size that is not an integer factor of the
number of rt blocks, it's possible for there to be rt blocks past the
end of the last rt extent. These tail blocks cannot ever be allocated
and will cause corruption reports if the last extent coincides with the
end of an rt bitmap block, so do not report consider them for the
GETFSMAP output.

Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

xfs: fix integer overflows in the fsmap rtbitmap and logdev backends

It's not correct to use the rmap irec structure to hold query key
information to query the rtbitmap because the realtime volume can be
longer than 2^32 fsblocks in length. Because the rt volume doesn't have
allocation groups, introduce a daddr-based record filtering algorithm
and compute the rtextent values using 64-bit variables. The same
problem exists in the external log device fsmap implementation, so use
the same solution to fix it too.

After this patch, all the code that touches info->low and info->high
under xfs_getfsmap_logdev and __xfs_getfsmap_rtdev are unnecessary.
Cleaning this up will be done in subsequent patches.

Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

xfs: fix interval filtering in multi-step fsmap queries

I noticed a bug in ranged GETFSMAP queries:

# xfs_io -c 'fsmap -vvvv' /opt
EXT: DEV  BLOCK-RANGE           OWNER              FILE-OFFSET      AG AG-OFFSET           TOTAL
   0: 8:80 [0..7]:               static fs metadata                  0  (0..7)                  8
<snip>
   9: 8:80 [192..223]:           137                0..31            0  (192..223)             32
# xfs_io -c 'fsmap -vvvv -d 208 208' /opt
#

That's not right -- we asked what block maps block 208, and we should've
received a mapping for inode 137 offset 16.  Instead, we get nothing.

The root cause of this problem is a mis-interaction between the fsmap
code and how btree ranged queries work.  xfs_btree_query_range returns
any btree record that overlaps with the query interval, even if the
record starts before or ends after the interval.  Similarly, GETFSMAP is
supposed to return a recordset containing all records that overlap the
range queried.

However, it's possible that the recordset is larger than the buffer that
the caller provided to convey mappings to userspace.  In /that/ case,
userspace is supposed to copy the last record returned to fmh_keys[0]
and call GETFSMAP again.  In this case, we do not want to return
mappings that we have already supplied to the caller.  The call to
xfs_btree_query_range is the same, but now we ignore any records that
start before fmh_keys[0].

Unfortunately, we didn't implement the filtering predicate correctly.
The predicate should only be called when we're calling back for more
records.  Accomplish this by setting info->low.rm_blockcount to a
nonzero value and ensuring that it is cleared as necessary.  As a
result, we no longer want to adjust dkeys[0] in the main setup function
because that's confusing.

This patch doesn't touch the logdev/rtbitmap backends because they have
bigger problems that will be addressed by subsequent patches.

Found via xfs/556 with parent pointers enabled.

Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl")
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>

Merge branch 'octeontx2-af-fixes'

Hariprasad Kelam says:

====================
octeontx2-af: MAC block fixes for CN10KB

This patch set contains fixes for the issues encountered in testing
CN10KB MAC block RPM_USX.

Patch1: firmware to kernel communication is not working due to wrong
        interrupt configuration. CSR addresses are corrected.

Patch2: NIX to RVU PF mapping errors encountered due to wrong firmware
        config. Corrects this mapping error.

Patch3: Driver is trying to access non exist cgx/lmac which is resulting
        in kernel panic. Address this issue by adding proper checks.

Patch4: MAC features are not getting reset on FLR. Fix the issue by
        resetting the stale config.
====================

Signed-off-by: David S. Miller <[email protected]>

octeontx2-af: Reset MAC features in FLR

AF driver configures MAC features like internal loopback and PFC upon
receiving the request from PF and its VF netdev. But these
features are not getting reset in FLR. This patch fixes the issue by
resetting the same.

Fixes: 23999b30ae67 ("octeontx2-af: Enable or disable CGX internal loopback")
Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support")
Signed-off-by: Hariprasad Kelam <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

octeontx2-af: Add validation before accessing cgx and lmac

with the addition of new MAC blocks like CN10K RPM and CN10KB
RPM_USX, LMACs are noncontiguous and CGX blocks are also
noncontiguous. But during RVU driver initialization, the driver
is assuming they are contiguous and trying to access
cgx or lmac with their id which is resulting in kernel panic.

This patch fixes the issue by adding proper checks.

[   23.219150] pc : cgx_lmac_read+0x38/0x70
[   23.219154] lr : rvu_program_channels+0x3f0/0x498
[   23.223852] sp : ffff000100d6fc80
[   23.227158] x29: ffff000100d6fc80 x28: ffff00010009f880 x27:
000000000000005a
[   23.234288] x26: ffff000102586768 x25: 0000000000002500 x24:
fffffffffff0f000

Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Hariprasad Kelam <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

octeontx2-af: Fix mapping for NIX block from CGX connection

Firmware configures NIX block mapping for all MAC blocks.
The current implementation reads the configuration and
creates the mapping between RVU PF and NIX blocks. But
this configuration is only valid for silicons that support
multiple blocks. For all other silicons, all MAC blocks
map to NIX0.

This patch corrects the mapping by adding a check for the same.

Fixes: c5a73b632b90 ("octeontx2-af: Map NIX block from CGX connection")
Signed-off-by: Hariprasad Kelam <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

octeontx2-af: cn10kb: fix interrupt csr addresses

The current design is that, for asynchronous events like link_up and
link_down firmware raises the interrupt to kernel. The previous patch
which added RPM_USX driver has a bug where it uses old csr addresses
for configuring interrupts. Which is resulting in losing interrupts
from source firmware.

This patch fixes the issue by correcting csr addresses.

Fixes: b9d0fedc6234 ("octeontx2-af: cn10kb: Add RPM_USX MAC support")
Signed-off-by: Hariprasad Kelam <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

nvme-tcp: Fix comma-related oops

Fix a comma that should be a semicolon.  The comma is at the end of an
if-body and thus makes the statement after (a bvec_set_page()) conditional
too, resulting in an oops because we didn't fill out the bio_vec[]:

    BUG: kernel NULL pointer dereference, address: 0000000000000008
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    ...
    Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
    RIP: 0010:skb_splice_from_iter+0xf1/0x370
    ...
    Call Trace:
     tcp_sendmsg_locked+0x3a6/0xdd0
     tcp_sendmsg+0x31/0x50
     inet_sendmsg+0x47/0x80
     sock_sendmsg+0x99/0xb0
     nvme_tcp_try_send_data+0x149/0x490 [nvme_tcp]
     nvme_tcp_try_send+0x1b7/0x300 [nvme_tcp]
     nvme_tcp_io_work+0x40/0xc0 [nvme_tcp]
     process_one_work+0x21c/0x430
     worker_thread+0x54/0x3e0
     kthread+0xf8/0x130

Fixes: 7769887817c3 ("nvme-tcp: Use sendmsg(MSG_SPLICE_PAGES) rather then sendpage")
Reported-by: Aurelien Aptel <[email protected]>
Link: https://lore.kernel.org/r/253mt0il43o.fsf@mtr-vdi-124.i-did-not-set--mail-host-address--so-tickle-me/
Signed-off-by: David Howells <[email protected]>
cc: Sagi Grimberg <[email protected]>
cc: Willem de Bruijn <[email protected]>
cc: Keith Busch <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Christoph Hellwig <[email protected]>
cc: Chaitanya Kulkarni <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Jens Axboe <[email protected]>
cc: Matthew Wilcox <[email protected]>
cc: [email protected]
cc: [email protected]
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fs: move cleanup from init_file() into its callers

The use of file_free_rcu() in init_file() to free the struct that was
allocated by the caller was hacky and we got what we deserved.

Let init_file() and its callers take care of cleaning up each after
their own allocated resources on error.

Fixes: 62d53c4a1dfe ("fs: use backing_file container for internal files with "fake" f_path") # mainline only
Reported-and-tested-by: [email protected]
Signed-off-by: Amir Goldstein <[email protected]>
Message-Id: <20230701171134 [email protected]>
Signed-off-by: Christian Brauner <[email protected]>

Merge tag 'csky-for-linus-6.5' of https://github.com/c-sky/csky-linux

Pull arch/csky update from Guo Ren:

- Correct thread.trap_no restore of uprobe

* tag 'csky-for-linus-6.5' of https://github.com/c-sky/csky-linux:
csky: uprobes: Restore thread.trap_no

Merge tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
"Stable fixes and other bugfixes:

   - nfs: don't report STATX_BTIME in ->getattr

   - Revert 'NFSv4: Retry LOCK on OLD_STATEID during delegation return'
     since it breaks NFSv4 state recovery.

   - NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION

   - Fix the NFSv4.2 xattr cache shrinker_id

   - Force a ctime update after a NFSv4.2 SETXATTR call

  Features and cleanups:

   - NFS and RPC over TLS client code from Chuck Lever

   - Support for use of abstract unix socket addresses with the rpcbind
     daemon

   - Sysfs API to allow shutdown of the kernel RPC client and prevent
     umount() hangs if the server is known to be permanently down

   - XDR cleanups from Anna"

* tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (33 commits)
  Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
  NFS: Don't cleanup sysfs superblock entry if uninitialized
  nfs: don't report STATX_BTIME in ->getattr
  NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
  NFSv4.2: fix wrong shrinker_id
  NFSv4: Clean up some shutdown loops
  NFS: Cancel all existing RPC tasks when shutdown
  NFS: add sysfs shutdown knob
  NFS: add a sysfs link to the acl rpc_client
  NFS: add a sysfs link to the lockd rpc_client
  NFS: Add sysfs links to sunrpc clients for nfs_clients
  NFS: add superblock sysfs entries
  NFS: Make all of /sys/fs/nfs network-namespace unique
  NFS: Open-code the nfs_kset kset_create_and_add()
  NFS: rename nfs_client_kobj to nfs_net_kobj
  NFS: rename nfs_client_kset to nfs_kset
  NFS: Add an "xprtsec=" NFS mount option
  NFS: Have struct nfs_client carry a TLS policy field
  SUNRPC: Add a TCP-with-TLS RPC transport class
  SUNRPC: Capture CMSG metadata on client-side receive
  ...

Merge tag 'x86-urgent-2023-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
"A single regression fix for x86:

  Moving the invocation of arch_cpu_finalize_init() earlier in the boot
  process caused a boot regression on IBT enabled system.

  The root cause is not the move of arch_cpu_finalize_init() itself. The
  system fails to boot because the subsequent efi_enter_virtual_mode()
  code has a non-IBT safe EFI call inside. This was not noticed before
  because IBT was enabled after the EFI initialization.

  Switching the EFI call to use the IBT safe wrapper cures the problem"

* tag 'x86-urgent-2023-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Make efi_set_virtual_address_map IBT safe

Merge tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

- Remove the deprecated rule to build *.dtbo from *.dts

- Refactor section mismatch detection in modpost

- Fix bogus ARM section mismatch detections

- Fix error of 'make gtags' with O= option

- Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error
   with the latest LLVM version

- Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed

- Ignore more compiler-generated symbols for kallsyms

- Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles

- Enable more kernel-doc warnings with W=2

- Refactor <linux/export.h> by generating KSYMTAB data by modpost

- Deprecate <asm/export.h> and <asm-generic/export.h>

- Remove the EXPORT_DATA_SYMBOL macro

- Move the check for static EXPORT_SYMBOL back to modpost, which makes
   the build faster

- Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm

- Warn missing MODULE_DESCRIPTION when building modules with W=1

- Make 'make clean' robust against too long argument error

- Exclude more objects from GCOV to fix CFI failures with GCOV

- Allow 'make modules_install' to install modules.builtin and
   modules.builtin.modinfo even when CONFIG_MODULES is disabled

- Include modules.builtin and modules.builtin.modinfo in the
   linux-image Debian package even when CONFIG_MODULES is disabled

- Revive "Entering directory" logging for the latest Make version

* tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (72 commits)
  modpost: define more R_ARM_* for old distributions
  kbuild: revive "Entering directory" for Make >= 4.4.1
  kbuild: set correct abs_srctree and abs_objtree for package builds
  scripts/mksysmap: Ignore prefixed KCFI symbols
  kbuild: deb-pkg: remove the CONFIG_MODULES check in buildeb
  kbuild: builddeb: always make modules_install, to install modules.builtin*
  modpost: continue even with unknown relocation type
  modpost: factor out Elf_Sym pointer calculation to section_rel()
  modpost: factor out inst location calculation to section_rel()
  kbuild: Disable GCOV for *.mod.o
  kbuild: Fix CFI failures with GCOV
  kbuild: make clean rule robust against too long argument error
  script: modpost: emit a warning when the description is missing
  kbuild: make modules_install copy modules.builtin(.modinfo)
  linux/export.h: rename 'sec' argument to 'license'
  modpost: show offset from symbol for section mismatch warnings
  modpost: merge two similar section mismatch warnings
  kbuild: implement CONFIG_TRIM_UNUSED_KSYMS without recursion
  modpost: use null string instead of NULL pointer for default namespace
  modpost: squash sym_update_namespace() into sym_add_exported()
  ...

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
"Fix memory corruption (overwriting the kmalloc redzone) when saving
the SVE state while in SVE streaming mode"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: sme: Use STR P to clear FFR context field in streaming SVE mode

Merge tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull CXL updates from Dan Williams:
"The highlights in terms of new functionality are support for the
  standard CXL Performance Monitor definition that appeared in CXL 3.0,
  support for device sanitization (wiping all data from a device),
  secure-erase (re-keying encryption of user data), and support for
  firmware update. The firmware update support is notable as it reuses
  the simple sysfs_upload interface to just cat(1) a blob to a sysfs
  file and pipe that to the device.

  Additionally there are a substantial number of cleanups and
  reorganizations to get ready for RCH error handling (RCH == Restricted
  CXL Host == current shipping hardware generation / pre CXL-2.0
  topologies) and type-2 (accelerator / vendor specific) devices.

  For vendor specific devices they implement a subset of what the
  generic type-3 (generic memory expander) driver expects. As a result
  the rework decouples optional infrastructure from the core driver
  context.

  For RCH topologies, where the specification working group did not want
  to confuse pre-CXL-aware operating systems, many of the standard
  registers are hidden which makes support standard bus features like
  AER (PCIe Advanced Error Reporting) difficult. The rework arranges for
  the driver to help the PCI-AER core. Bjorn is on board with this
  direction but a late regression disocvery means the completion of this
  functionality needs to cook a bit longer, so it is code
  reorganizations only for now.

  Summary:

   - Add infrastructure for supporting background commands along with
     support for device sanitization and firmware update

   - Introduce a CXL performance monitoring unit driver based on the
     common definition in the specification.

   - Land some preparatory cleanup and refactoring for the anticipated
     arrival of CXL type-2 (accelerator devices) and CXL RCH (CXL-v1.1
     topology) error handling.

   - Rework CPU cache management with respect to region configuration
     (device hotplug or other dynamic changes to memory interleaving)

   - Fix region reconfiguration vs CXL decoder ordering rules"

* tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (51 commits)
  cxl: Fix one kernel-doc comment
  cxl/pci: Use correct flag for sanitize polling
  docs: perf: Minimal introduction the the CXL PMU device and driver
  perf: CXL Performance Monitoring Unit driver
  tools/testing/cxl: add firmware update emulation to CXL memdevs
  tools/testing/cxl: Use named effects for the Command Effect Log
  tools/testing/cxl: Fix command effects for inject/clear poison
  cxl: add a firmware update mechanism using the sysfs firmware loader
  cxl/test: Add Secure Erase opcode support
  cxl/mem: Support Secure Erase
  cxl/test: Add Sanitize opcode support
  cxl/mem: Wire up Sanitization support
  cxl/mbox: Add sanitization handling machinery
  cxl/mem: Introduce security state sysfs file
  cxl/mbox: Allow for IRQ_NONE case in the isr
  Revert "cxl/port: Enable the HDM decoder capability for switch ports"
  cxl/memdev: Formalize endpoint port linkage
  cxl/pci: Unconditionally unmask 256B Flit errors
  cxl/region: Manage decoder target_type at decoder-attach time
  cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEM
  ...

Merge tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull nvdimm and DAX updates from Vishal Verma:
"This is mostly small cleanups and fixes, with the biggest change being
  the change to the DAX fault handler allowing it to return
  VM_FAULT_HWPOISON.

  Summary:

   - DAX fixes and cleanups including a use after free, extra
     references, and device unregistration, and a redundant variable.

   - Allow the DAX fault handler to return VM_FAULT_HWPOISON

   - A few libnvdimm cleanups such as making some functions and
     variables static where sufficient.

   - Add a few missing prototypes for wrapped functions in
     tools/testing/nvdimm"

* tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: enable dax fault handler to report VM_FAULT_HWPOISON
  nvdimm: make security_show static
  nvdimm: make nd_class variable static
  dax/kmem: Pass valid argument to memory_group_register_static
  fsdax: remove redundant variable 'error'
  dax: Cleanup extra dax_region references
  dax: Introduce alloc_dev_dax_id()
  dax: Use device_unregister() in unregister_dax_mapping()
  dax: Fix dax_mapping_release() use after free
  tools/testing/nvdimm: Drop empty platform remove function
  libnvdimm: mark 'security_show' static again
  testing: nvdimm: add missing prototypes for wrapped functions
  dax: fix missing-prototype warnings

Merge tag 'sysctl-fixes-v2-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull another sysctl fix from Luis Chamberlain:
"Just one minor nit I forgot to merge"

* tag 'sysctl-fixes-v2-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: set variable sysctl_mount_point storage-class-specifier to static

Merge tag 'flex-array-transformations-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull flexible-array update from Gustavo Silva:
"Transform a zero-length array into a C99 flexible-array member.

  This addresses a build failure with Clang by fixing multiple
  '-Warray-bounds' warnings in drivers/staging/ks7010/ks_wlan_net.c"

* tag 'flex-array-transformations-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  uapi: wireless: Replace zero-length array with flexible-array member

pid: use struct_size_t() helper

Before commit d67790ddf021 ("overflow: Add struct_size_t() helper") only
struct_size() existed, which expects a valid pointer instance containing
the flexible array.

However, when we determine the default struct pid allocation size for
the associated kmem cache of a pid namespace we need to take the nesting
depth of the pid namespace into account without an variable instance
necessarily being available.

In commit b69f0aeb0689 ("pid: Replace struct pid 1-element array with
flex-array") we used to handle this the old fashioned way and cast NULL
to a struct pid pointer type. However, we do apparently have a dedicated
struct_size_t() helper for exactly this case. So switch to that.

Suggested-by: Kees Cook <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: Update do_vmi_align_munmap() return semantics

Since do_vmi_align_munmap() will always honor the downgrade request on
the success, the callers no longer have to deal with confusing return
codes.  Since all callers that request downgrade actually want the lock
to be dropped, change the downgrade to an unlock request.

Note that the lock still needs to be held in read mode during the page
table clean up to avoid races with a map request.

Update do_vmi_align_munmap() to return 0 for success.  Clean up the
callers and comments to always expect the unlock to be honored on the
success path.  The error path will always leave the lock untouched.

As part of the cleanup, the wrapper function do_vmi_munmap() and callers
to the wrapper are also updated.

Suggested-by: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: Always downgrade mmap_lock if requested

Now that stack growth must always hold the mmap_lock for write, we can
always downgrade the mmap_lock to read and safely unmap pages from the
page table, even if we're next to a stack.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

xtensa: fix lock_mm_and_find_vma in case VMA not found

MMU version of lock_mm_and_find_vma releases the mm lock before
returning when VMA is not found. Do the same in noMMU version.
This fixes hang on an attempt to handle protection fault.

Fixes: d85a143b69ab ("xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion")
Signed-off-by: Max Filippov <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

docs: networking: Update codeaurora references for rmnet

source.codeaurora.org is no longer accessible and so the reference link
in the documentation is not useful. Use iproute2 instead as it has a
rmnet module for configuration.

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Sean Tranchetti <[email protected]>
Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

docs: netdev: broaden mailbot to all MAINTAINERS

Reword slightly now that all MAINTAINERS have access to the commands.

Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net: usb: cdc_ether: add u-blox 0x1313 composition.

Add CDC-ECM support for LARA-R6 01B.

The new LARA-R6 product variant identified by the "01B" string can be
configured (by AT interface) in three different USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial
interface and 1 CDC-ECM virtual network interface
The first 4 interfaces of all the 3 configurations (default, RmNet, ECM)
are the same.

In CDC-ECM mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: CDC-ECM interface

Signed-off-by: Davide Tronchin <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge tag 'kvm-x86-vmx-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM VMX changes for 6.5:

- Fix missing/incorrect #GP checks on ENCLS

- Use standard mmu_notifier hooks for handling APIC access page

- Misc cleanups

Merge tag 'kvm-x86-svm-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM SVM changes for 6.5:

- Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state

- Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that
   are handled in the fastpath

- Print more descriptive information about the status of SEV and SEV-ES during
   module load

- Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible
   memory leaks

Merge tag 'kvm-x86-selftests-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM selftests changes for 6.5:

- Add a test for splitting and reconstituting hugepages during and after
dirty logging

- Add support for CPU pinning in demand paging test

- Generate dependency files so that partial rebuilds work as expected

- Misc cleanups and fixes

Merge tag 'kvm-x86-pmu-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86/pmu changes for 6.5:

- Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
included along the way

Merge tag 'kvm-x86-mmu-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86/mmu changes for 6.5:

- Add back a comment about the subtle side effect of try_cmpxchg64() in
   tdp_mmu_set_spte_atomic()

- Add an assertion in __kvm_mmu_invalidate_addr() to verify that the target
   KVM MMU is the current MMU

- Add a "never" option to effectively avoid creating NX hugepage recovery
   threads

Merge tag 'kvm-x86-misc-6.5' of https://github.com/kvm-x86/linux into HEAD

KVM x86 changes for 6.5:

* Move handling of PAT out of MTRR code and dedup SVM+VMX code

* Fix output of PIC poll command emulation when there's an interrupt

* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.

* Misc cleanups

Merge tag 'kvm-x86-generic-6.5' of https://github.com/kvm-x86/linux into HEAD

Common KVM changes for 6.5:

- Fix unprotected vcpu->pid dereference via debugfs

- Fix KVM_BUG() and KVM_BUG_ON() macros with 64-bit conditionals

- Refactor failure path in kvm_io_bus_unregister_dev() to simplify the code

- Misc cleanups

Merge tag 'kvmarm-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for 6.5

- Eager page splitting optimization for dirty logging, optionally
   allowing for a VM to avoid the cost of block splitting in the stage-2
   fault path.

- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
   services that live in the Secure world. pKVM intervenes on FF-A calls
   to guarantee the host doesn't misuse memory donated to the hyp or a
   pKVM guest.

- Support for running the split hypervisor with VHE enabled, known as
   'hVHE' mode. This is extremely useful for testing the split
   hypervisor on VHE-only systems, and paves the way for new use cases
   that depend on having two TTBRs available at EL2.

- Generalized framework for configurable ID registers from userspace.
   KVM/arm64 currently prevents arbitrary CPU feature set configuration
   from userspace, but the intent is to relax this limitation and allow
   userspace to select a feature set consistent with the CPU.

- Enable the use of Branch Target Identification (FEAT_BTI) in the
   hypervisor.

- Use a separate set of pointer authentication keys for the hypervisor
   when running in protected mode, as the host is untrusted at runtime.

- Ensure timer IRQs are consistently released in the init failure
   paths.

- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
   (FEAT_EVT), as it is a register commonly read from userspace.

- Erratum workaround for the upcoming AmpereOne part, which has broken
   hardware A/D state management.

As a consequence of the hVHE series reworking the arm64 software
features framework, the for-next/module-alloc branch from the arm64 tree
comes along for the ride.

Merge tag 'kvm-riscv-6.5-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.5

- Redirect AMO load/store misaligned traps to KVM guest
- Trap-n-emulate AIA in-kernel irqchip for KVM guest
- Svnapot support for KVM Guest

Merge tag 'kvm-s390-next-6.5-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

* New uvdevice secret API
* New CMM selftest
* cmm fix
* diag 9c racy access of target cpu fix

Merge tag '6.5-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client updates from Steve French:

- Deferred close fix

- Debugging improvements: display missing mount option, dump rc on
   invalidate inode failures, print client_guid in DebugData, log
   session id when matching session not found in reconnect, new dynamic
   tracepoint for session not found

- Mount fixes including: potential null dereference, and possible
   memory leak and path name parsing when double slashes

- Fix potential use after free in compounding

- Two crediting (flow control) fixes: fix for crediting leak (stress
   scenario with excess lease credits) and better locking around
   updating credits

- Three cleanups from issues pointed out by the kernel test robot

- Session state check improvements (including for potential use after
   free)

- DFS fixes: Fix for getattr on link when DFS disabled, fix for DFS
   mounts to same share with different prefix paths, DFS mount error
   checking improvement

* tag '6.5-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: new dynamic tracepoint to track ses not found errors
  cifs: log session id when a matching ses is not found
  smb: client: improve DFS mount check
  smb: client: fix shared DFS root mounts with different prefixes
  smb: client: fix parsing of source mount option
  smb: client: fix broken file attrs with nodfs mounts
  cifs: print client_guid in DebugData
  cifs: fix session state check in smb2_find_smb_ses
  cifs: fix session state check in reconnect to avoid use-after-free issue
  cifs: do all necessary checks for credits within or before locking
  cifs: prevent use-after-free by freeing the cfile later
  smb: client: fix warning in generic_ip_connect()
  smb: client: fix warning in CIFSFindNext()
  smb: client: fix warning in CIFSFindFirst()
  smb3: do not reserve too many oplock credits
  cifs: print more detail when invalidate_inode_mapping fails
  smb: client: fix warning in cifs_smb3_do_mount()
  smb: client: fix warning in cifs_match_super()
  cifs: print nosharesock value while dumping mount options
  SMB3: Do not send lease break acknowledgment if all file handles have been closed

Merge tag '6.5-rc-ksmbd-server-fixes-part1' of git://git.samba.org/ksmbd

Pull ksmbd server updates from Steve French:

- two fixes for compounding bugs (make sure no out of bound reads with
   less common combinations of commands in the compound)

- eight minor cleanup patches (e.g. simplifying return values, replace
   one element array, use of kzalloc where simpler)

- fix for clang warning on possible overflow in filename conversion

* tag '6.5-rc-ksmbd-server-fixes-part1' of git://git.samba.org/ksmbd:
  ksmbd: avoid field overflow warning
  ksmbd: Replace one-element array with flexible-array member
  ksmbd: Use struct_size() helper in ksmbd_negotiate_smb_dialect()
  ksmbd: add missing compound request handing in some commands
  ksmbd: fix out of bounds read in smb2_sess_setup
  ksmbd: Replace the ternary conditional operator with min()
  ksmbd: use kvzalloc instead of kvmalloc
  ksmbd: Change the return value of ksmbd_vfs_query_maximal_access to void
  ksmbd: return a literal instead of 'err' in ksmbd_vfs_kern_path_locked()
  ksmbd: use kzalloc() instead of __GFP_ZERO
  ksmbd: remove unused ksmbd_tree_conn_share function

Merge tag 'nfsd-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

- Fix ordering of attributes in NFSv4 GETATTR replies

* tag 'nfsd-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: Fix creation time serialization order

Merge tag 'livepatching-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching

Pull livepatching update from Petr Mladek:

- Make a variable static to fix a sparse warning

* tag 'livepatching-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Make 'klp_stack_entries' static

Merge tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
"Although some more stuff is brewing, the EFI changes that are ready
  for mainline are few this cycle:

   - improve the PCI DMA paranoia logic in the EFI stub

   - some constification changes

   - add statfs support to efivarfs

   - allow user space to enumerate updatable firmware resources without
     CAP_SYS_ADMIN"

* tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/libstub: Disable PCI DMA before grabbing the EFI memory map
  efi/esrt: Allow ESRT access without CAP_SYS_ADMIN
  efivarfs: expose used and total size
  efi: make kobj_type structure constant
  efi: x86: make kobj_type structure constant

Merge tag 'v6.5-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
"API:
   - Add linear akcipher/sig API
   - Add tfm cloning (hmac, cmac)
   - Add statesize to crypto_ahash

  Algorithms:
   - Allow only odd e and restrict value in FIPS mode for RSA
   - Replace LFSR with SHA3-256 in jitter
   - Add interface for gathering of raw entropy in jitter

  Drivers:
   - Fix race on data_avail and actual data in hwrng/virtio
   - Add hash and HMAC support in starfive
   - Add RSA algo support in starfive
   - Add support for PCI device 0x156E in ccp"

* tag 'v6.5-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (85 commits)
  crypto: akcipher - Do not copy dst if it is NULL
  crypto: sig - Fix verify call
  crypto: akcipher - Set request tfm on sync path
  crypto: sm2 - Provide sm2_compute_z_digest when sm2 is disabled
  hwrng: imx-rngc - switch to DEFINE_SIMPLE_DEV_PM_OPS
  hwrng: st - keep clock enabled while hwrng is registered
  hwrng: st - support compile-testing
  hwrng: imx-rngc - fix the timeout for init and self check
  KEYS: asymmetric: Use new crypto interface without scatterlists
  KEYS: asymmetric: Move sm2 code into x509_public_key
  KEYS: Add forward declaration in asymmetric-parser.h
  crypto: sig - Add interface for sign/verify
  crypto: akcipher - Add sync interface without SG lists
  crypto: cipher - On clone do crypto_mod_get()
  crypto: api - Add __crypto_alloc_tfmgfp
  crypto: api - Remove crypto_init_ops()
  crypto: rsa - allow only odd e and restrict value in FIPS mode
  crypto: geniv - Split geniv out of AEAD Kconfig option
  crypto: algboss - Add missing dependency on RNG2
  crypto: starfive - Add RSA algo support
  ...

xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion

It turns out that xtensa has a really odd configuration situation: you
can do a no-MMU config, but still have the page fault code enabled.
Which doesn't sound all that sensible, but it turns out that xtensa can
have protection faults even without the MMU, and we have this:

    config PFAULT
        bool "Handle protection faults" if EXPERT && !MMU
        default y
        help
          Handle protection faults. MMU configurations must enable it.
          noMMU configurations may disable it if used memory map never
          generates protection faults or faults are always fatal.

          If unsure, say Y.

which completely violated my expectations of the page fault handling.

End result: Guenter reports that the xtensa no-MMU builds all fail with

  arch/xtensa/mm/fault.c: In function ‘do_page_fault’:
  arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’

because I never exposed the new lock_mm_and_find_vma() function for the
no-MMU case.

Doing so is simple enough, and fixes the problem.

Reported-and-tested-by: Guenter Roeck <[email protected]>
Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()")
Signed-off-by: Linus Torvalds <[email protected]>

Merge tag 'md-fixes-20230630' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.5

Pull MD fix from Song:

"This patch fixes data corruption caused by discard on raid0 array with
original layout."

* tag 'md-fixes-20230630' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid0: add discard support for the 'original' layout

f2fs: fix to do sanity check on direct node in truncate_dnode()

syzbot reports below bug:

BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000

CPU: 1 PID: 5000 Comm: syz-executor148 Not tainted 6.4.0-rc7-syzkaller-00041-ge660abd551f1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:351
print_report mm/kasan/report.c:462 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:572
f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
truncate_dnode+0x229/0x2e0 fs/f2fs/node.c:944
f2fs_truncate_inode_blocks+0x64b/0xde0 fs/f2fs/node.c:1154
f2fs_do_truncate_blocks+0x4ac/0xf30 fs/f2fs/file.c:721
f2fs_truncate_blocks+0x7b/0x300 fs/f2fs/file.c:749
f2fs_truncate.part.0+0x4a5/0x630 fs/f2fs/file.c:799
f2fs_truncate include/linux/fs.h:825 [inline]
f2fs_setattr+0x1738/0x2090 fs/f2fs/file.c:1006
notify_change+0xb2c/0x1180 fs/attr.c:483
do_truncate+0x143/0x200 fs/open.c:66
handle_truncate fs/namei.c:3295 [inline]
do_open fs/namei.c:3640 [inline]
path_openat+0x2083/0x2750 fs/namei.c:3791
do_filp_open+0x1ba/0x410 fs/namei.c:3818
do_sys_openat2+0x16d/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_creat fs/open.c:1448 [inline]
__se_sys_creat fs/open.c:1442 [inline]
__x64_sys_creat+0xcd/0x120 fs/open.c:1442
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
node page, it traverse mapping data from node->i.i_addr[0] to
node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.

This patch fixes to add sanity check on dnode page in truncate_dnode(),
so that, it can help to avoid triggering such issue, and once it encounters
such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
error into superblock, later fsck can detect such issue and try repairing.

Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
function has only one caller, and uses f2fs_truncate_data_blocks_range()
instead.

Reported-and-tested-by: [email protected]
Closes: https://lore.kernel.org/linux-f2fs-devel/[email protected]
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: only set release for file that has compressed data

If a file is not comprssed yet or does not have compressed data,
for example, its data has a very low compression ratio, do not
set FI_COMPRESS_RELEASED flag.

Signed-off-by: Sheng Yong <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix compile warning in f2fs_destroy_node_manager()

fs/f2fs/node.c: In function ‘f2fs_destroy_node_manager’:
fs/f2fs/node.c:3390:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
3390 | }

Merging below pointer arrays into common one, and reuse it by cast type.

struct nat_entry *natvec[NATVEC_SIZE];
struct nat_entry_set *setvec[SETVEC_SIZE];

Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix error path handling in truncate_dnode()

If truncate_node() fails in truncate_dnode(), it missed to call
f2fs_put_page(), fix it.

Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()")
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

f2fs: fix deadlock in i_xattr_sem and inode page lock

Thread #1:

[122554.641906][   T92]  f2fs_getxattr+0xd4/0x5fc
    -> waiting for f2fs_down_read(&F2FS_I(inode)->i_xattr_sem);

[122554.641927][   T92]  __f2fs_get_acl+0x50/0x284
[122554.641948][   T92]  f2fs_init_acl+0x84/0x54c
[122554.641969][   T92]  f2fs_init_inode_metadata+0x460/0x5f0
[122554.641990][   T92]  f2fs_add_inline_entry+0x11c/0x350
    -> Locked dir->inode_page by f2fs_get_node_page()

[122554.642009][   T92]  f2fs_do_add_link+0x100/0x1e4
[122554.642025][   T92]  f2fs_create+0xf4/0x22c
[122554.642047][   T92]  vfs_create+0x130/0x1f4

Thread #2:

[123996.386358][   T92]  __get_node_page+0x8c/0x504
    -> waiting for dir->inode_page lock

[123996.386383][   T92]  read_all_xattrs+0x11c/0x1f4
[123996.386405][   T92]  __f2fs_setxattr+0xcc/0x528
[123996.386424][   T92]  f2fs_setxattr+0x158/0x1f4
    -> f2fs_down_write(&F2FS_I(inode)->i_xattr_sem);

[123996.386443][   T92]  __f2fs_set_acl+0x328/0x430
[123996.386618][   T92]  f2fs_set_acl+0x38/0x50
[123996.386642][   T92]  posix_acl_chmod+0xc8/0x1c8
[123996.386669][   T92]  f2fs_setattr+0x5e0/0x6bc
[123996.386689][   T92]  notify_change+0x4d8/0x580
[123996.386717][   T92]  chmod_common+0xd8/0x184
[123996.386748][   T92]  do_fchmodat+0x60/0x124
[123996.386766][   T92]  __arm64_sys_fchmodat+0x28/0x3c

Cc: <[email protected]>
Fixes: 27161f13e3c3 "f2fs: avoid race in between read xattr & write xattr"
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>

sysctl: set variable sysctl_mount_point storage-class-specifier to static

smatch reports
fs/proc/proc_sysctl.c:32:18: warning: symbol
'sysctl_mount_point' was not declared. Should it be static?

This variable is only used in its defining file, so it should be static.

Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>

md/raid0: add discard support for the 'original' layout

We've found that using raid0 with the 'original' layout and discard
enabled with different disk sizes (such that at least two zones are
created) can result in data corruption. This is due to the fact that
the discard handling in 'raid0_handle_discard()' assumes the 'alternate'
layout. We've seen this corruption using ext4 but other filesystems are
likely susceptible as well.

More specifically, while multiple zones are necessary to create the
corruption, the corruption may not occur with multiple zones if they
layout in such a way the layout matches what the 'alternate' layout
would have produced. Thus, not all raid0 devices with the 'original'
layout, different size disks and discard enabled will encounter this
corruption.

The 3.14 kernel inadvertently changed the raid0 disk layout for different
size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the
same raid0 array could corrupt data. This lead to the creation of the
'original' layout (to match the pre-3.14 layout) and the 'alternate' layout
(to match the post 3.14 layout) in the 5.4 kernel time frame and an option
to tell the kernel which layout to use (since it couldn't be autodetected).
However, when the 'original' layout was added back to 5.4 discard support
for the 'original' layout was not added leading this issue.

I've been able to reliably reproduce the corruption with the following
test case:

1. create raid0 array with different size disks using original layout
2. mkfs
3. mount -o discard
4. create lots of files
5. remove 1/2 the files
6. fstrim -a (or just the mount point for the raid0 array)
7. umount
8. fsck -fn /dev/md0 (spews all sorts of corruptions)

Let's fix this by adding proper discard support to the 'original' layout.
The fix 'maps' the 'original' layout disks to the order in which they are
read/written such that we can compare the disks in the same way that the
current 'alternate' layout does. A 'disk_shift' field is added to
'struct strip_zone'. This could be computed on the fly in
raid0_handle_discard() but by adding this field, we save some computation
in the discard path.

Note we could also potentially fix this by re-ordering the disks in the
zones that follow the first one, and then always read/writing them using
the 'alternate' layout. However, that is seen as a more substantial change,
and we are attempting the least invasive fix at this time to remedy the
corruption.

I've verified the change using the reproducer mentioned above. Typically,
the corruption is seen after less than 3 iterations, while the patch has
run 500+ iterations.

Cc: NeilBrown <[email protected]>
Cc: Song Liu <[email protected]>
Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
Cc: [email protected]
Signed-off-by: Jason Baron <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0

Sec proxy/message manager data buffer is 60 bytes with the last of the
registers indicating transmission completion. This however poses a bit
of a challenge.

The backing memory for sec_proxy / message manager is regular memory,
and all sec proxy does is to trigger a burst of all 60 bytes of data
over to the target thread backing ring accelerator. It doesn't do a
memory scrub when it moves data out in the burst. When we transmit
multiple messages, remnants of previous message is also transmitted
which results in some random data being set in TISCI fields of
messages that have been expanded forward.

The entire concept of backward compatibility hinges on the fact that
the unused message fields remain 0x0 allowing for 0x0 value to be
specially considered when backward compatibility of message extension
is done.

So, instead of just writing the completion register, we continue
to fill the message buffer up with 0x0 (note: for partial message
involving completion, we already do this).

This allows us to scale and introduce ABI changes back also work with
other boot stages that may have left data in the internal memory.

While at this, be consistent and explicit with the data_reg pointer
increment.

Fixes: aace66b170ce ("mailbox: Introduce TI message manager driver")
Signed-off-by: Nishanth Menon <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

mailbox: tegra: add support for Tegra264

Tegra264 has a slightly different doorbell register layout than
previous chips.

Acked-by: Thierry Reding <[email protected]>
Signed-off-by: Stefan Kristiansson <[email protected]>
Signed-off-by: Peter De Schrijver <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

dt-bindings: mailbox: tegra: Document Tegra264 HSP

Add the compatible string for the HSP block found on the Tegra264 SoC.
The HSP block in Tegra264 is not register compatible with the one in
Tegra194 or Tegra234 hence there is no fallback compatibility string.

Acked-by: Krzysztof Kozlowski <[email protected]>
Acked-by: Thierry Reding <[email protected]>
Signed-off-by: Peter De Schrijver <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

dt-bindings: mailbox: convert bcm2835-mbox bindings to YAML

Convert the DT binding document for bcm2835-mbox from .txt to YAML.

Signed-off-by: Stefan Wahren <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

dt-bindings: mailbox: qcom: Add IPQ5018 APCS compatible

Add compatible for the Qualcomm IPQ5018 APCS block.

Signed-off-by: Manikanta Mylavarapu <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: Jassi Brar <[email protected]>

Merge tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

- Adjust log levels for common messages (Oleksandr Natalenko, Alex
   Williamson)

- Support for dynamic MSI-X allocation (Reinette Chatre)

- Enable and report PCIe AtomicOp Completer capabilities (Alex
   Williamson)

- Cleanup Kconfigs for vfio bus drivers (Alex Williamson)

- Add support for CDX bus based devices (Nipun Gupta)

- Fix race with concurrent mdev initialization (Eric Farman)

* tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio:
  vfio/mdev: Move the compat_class initialization to module init
  vfio/cdx: add support for CDX bus
  vfio/fsl: Create Kconfig sub-menu
  vfio/platform: Cleanup Kconfig
  vfio/pci: Cleanup Kconfig
  vfio/pci-core: Add capability for AtomicOp completer support
  vfio/pci: Also demote hiding standard cap messages
  vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
  vfio/pci: Support dynamic MSI-X
  vfio/pci: Probe and store ability to support dynamic MSI-X
  vfio/pci: Use bitfield for struct vfio_pci_core_device flags
  vfio/pci: Update stale comment
  vfio/pci: Remove interrupt context counter
  vfio/pci: Use xarray for interrupt context storage
  vfio/pci: Move to single error path
  vfio/pci: Prepare for dynamic interrupt context storage
  vfio/pci: Remove negative check on unsigned vector
  vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
  vfio/pci: demote hiding ecap messages to debug level

Merge tag 'pci-v6.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci updates from Bjorn Helgaas:
"Enumeration:

   - Export pcie_retrain_link() for use outside ASPM

   - Add Data Link Layer Link Active Reporting as another way for
     pcie_retrain_link() to determine the link is up

   - Work around link training failures (especially on the ASMedia
     ASM2824 switch) by training first at 2.5GT/s and then attempting
     higher rates

  Resource management:

   - When we coalesce host bridge windows, remove invalidated resources
     from the resource tree so future allocations work correctly

  Hotplug:

   - Cancel bringup sequence if card is not present, to keep from
     blinking Power Indicator indefinitely

   - Reassign bridge resources if necessary for ACPI hotplug

  Driver binding:

   - Convert platform_device .remove() callbacks to return void instead
     of a mostly useless int

  Power management:

   - Reduce wait time for secondary bus to be ready to speed up resume

   - Avoid putting EloPOS E2/S2/H2 (as well as Elo i2) PCIe Ports in
     D3cold

   - Call _REG when transitioning D-states so AML that uses the PCI
     config space OpRegion works, which fixes some ASMedia GPIO
     controllers after resume

  Virtualization:

   - Delay extra 250ms after FLR of Solidigm P44 Pro NVMe to avoid KVM
     hang when guest is rebooted

   - Add function 1 DMA alias quirk for Marvell 88SE9235

  Error handling:

   - Unexport pci_save_aer_state() since it's only used in drivers/pci/

   - Drop recommendation for drivers to configure AER Capability, since
     the PCI core does this for all devices

  ASPM:

   - Disable ASPM on MFD function removal to avoid use-after-free

   - Tighten up pci_enable_link_state() and pci_disable_link_state()
     interfaces so they don't enable/disable states the driver didn't
     specify

   - Avoid link retraining race that can happen if ASPM sets link
     control parameters while the link is in the midst of training for
     some other reason

  Endpoint framework:

   - Change "PCI Endpoint Virtual NTB driver" Kconfig prompt to be
     different from "PCI Endpoint NTB driver"

   - Automatically create a function specific attributes group for
     endpoint drivers to avoid reference counting issues

   - Fix many EPC test issues

   - Return pci_epf_type_add_cfs() error if EPF has no driver

   - Add kernel-doc for pci_epc_raise_irq() and pci_epc_map_msi_irq()
     MSI vector parameters

   - Pass EPF device ID to driver probe functions

   - Return -EALREADY if EPC has already been started/stopped

   - Add linkdown notifier support and use it in qcom-ep

   - Add Bus Master Enable event support and use it in qcom-ep

   - Add Qualcomm Modem Host Interface (MHI) endpoint driver

   - Add Layerscape PME interrupt handling to manage link-up
     notification

  Cadence PCIe controller driver:

   - Wait for link retrain to complete when working around the J721E
     i2085 erratum with Gen2 mode

  Faraday FTPC100 PCI controller driver:

   - Release clock resources on error paths

  Freescale i.MX6 PCIe controller driver:

   - Save and restore Root Port MSI control to work around hardware defect

  Intel VMD host bridge driver:

   - Reset VMD config register between soft reboots

   - Capture pci_reset_bus() return value instead of printing junk when
     it fails

  Qualcomm PCIe controller driver:

   - Add SDX65 endpoint compatible string to DT binding

   - Disable register write access after init for IP v2.3.3, v2.9.0

   - Use DWC helpers for enabling/disabling writes to DBI registers

   - Hide slot hotplug capability for IP v1.0.0, v1.9.0, v2.1.0, v2.3.2,
     v2.3.3, v2.7.0, v2.9.0

   - Reuse v2.3.2 post-init sequence for v2.4.0

  Renesas R-Car PCIe controller driver:

   - Remove unused static pcie_base and pcie_dev

  Rockchip PCIe controller driver:

   - Remove writes to unused registers

   - Write endpoint Device ID using correct register

   - Assert PCI Configuration Enable bit after probe so endpoint
     responds instead of generating Request Retry Status messages

   - Poll waiting for PHY PLLs to lock

   - Update RK3399 example DT binding to be valid

   - Use RK3399 PCIE_CLIENT_LEGACY_INT_CTRL to generate INTx instead of
     manually generating PCIe message

   - Use multiple windows to avoid address translation conflicts

   - Use u32 (not u16) when accessing 32-bit registers

   - Hide MSI-X Capability, since RK3399 can't generate MSI-X

   - Set endpoint controller required alignment to 256

  Synopsys DesignWare PCIe controller driver:

   - Wait for link to come up only if we've initiated link training

  Miscellaneous:

   - Add pci_clear_master() stub for non-CONFIG_PCI"

* tag 'pci-v6.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (116 commits)
  Documentation: PCI: correct spelling
  PCI: vmd: Fix uninitialized variable usage in vmd_enable_domain()
  PCI: xgene-msi: Convert to platform remove callback returning void
  PCI: tegra: Convert to platform remove callback returning void
  PCI: rockchip-host: Convert to platform remove callback returning void
  PCI: mvebu: Convert to platform remove callback returning void
  PCI: mt7621: Convert to platform remove callback returning void
  PCI: mediatek-gen3: Convert to platform remove callback returning void
  PCI: mediatek: Convert to platform remove callback returning void
  PCI: iproc: Convert to platform remove callback returning void
  PCI: hisi-error: Convert to platform remove callback returning void
  PCI: dwc: Convert to platform remove callback returning void
  PCI: j721e: Convert to platform remove callback returning void
  PCI: brcmstb: Convert to platform remove callback returning void
  PCI: altera-msi: Convert to platform remove callback returning void
  PCI: altera: Convert to platform remove callback returning void
  PCI: aardvark: Convert to platform remove callback returning void
  PCI: rcar: Use correct product family name for Renesas R-Car
  PCI: layerscape: Add the endpoint linkup notifier support
  PCI: endpoint: pci-epf-vntb: Fix typo in comments
  ...

Merge tag 'pinctrl-v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
"No core changes this time

  New drivers:

   - Tegra234 support

   - Qualcomm IPQ5018 support

   - Intel Meteor Lake-S support

   - Qualcomm SDX75 subdriver

   - Qualcomm SPMI-based PM8953 support

  Improvements:

   - Fix up support for GPIO3 on the AXP209

   - Push-pull drive configuration support for the AT91 PIO4

   - Fix misc non-urgent bugs in the AMD driver

   - Misc non-urgent improved error handling

   - Misc janitorial and minor improvements"

* tag 'pinctrl-v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (75 commits)
  pinctrl: cherryview: Drop goto label
  pinctrl: baytrail: invert if condition
  pinctrl: baytrail: add warning for BYT_VAL_REG retrieval failure
  pinctrl: baytrail: reduce scope of spinlock in ->dbg_show() hook
  pinctrl: tegra: avoid duplicate field initializers
  dt-bindings: pinctrl: qcom,sdx65-tlmm: add pcie_clkreq function
  pinctrl: mlxbf3: remove broken Kconfig 'select'
  pinctrl: spear: Remove unused of_gpio.h inclusion
  pinctrl: lantiq: Remove unused of_gpio.h inclusion
  pinctrl: at91-pio4: check return value of devm_kasprintf()
  pinctrl: microchip-sgpio: check return value of devm_kasprintf()
  pinctrl: freescale: Fix a memory out of bounds when num_configs is 1
  pinctrl: intel: refine ->irq_set_type() hook
  pinctrl: intel: refine ->set_mux() hook
  pinctrl: baytrail: Use str_hi_lo() helper
  lib/string_choices: Add str_high_low() helper
  lib/string_helpers: Split out string_choices.h
  lib/string_helpers: Add missing header files to MAINTAINERS database
  pinctrl: npcm7xx: Add missing check for ioremap
  pinctrl:sunplus: Add check for kmalloc
  ...

Merge tag 'platform-drivers-x86-v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Hans de Goede:
"AMD PMC and PMF drivers:
   - Various bugfixes
   - Improved debugging support

  Intel PMC:
   - Refactor to support hw with multiple PMCs
   - Various other improvements / new hw support

  Intel Speed Select Technology (ISST):
   - TPMI Uncore Frequency + Cluster Level Power Controls
   - Various bugfixes
   - tools/intel-speed-select: Misc improvements

  Dell-DDV: Add documentation

  INT3472 ACPI camera sensor glue code:
   - Evaluate device's _DSM method to control imaging clock
   - Drop the need to have a table with per sensor-model info

  Lenovo Yogabook:
   - Refactor / rework to also support Android models

  Think-LMI:
   - Multiple improvements and fixes

  WMI:
   - Add proper API documentation for the WMI bus

  x86-android-tablets:
   - Misc new hw support

  Miscellaneous other cleanups / fixes"

* tag 'platform-drivers-x86-v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (91 commits)
  platform/x86:intel/pmc: Add Meteor Lake IOE-M PMC related maps
  platform/x86:intel/pmc: Add Meteor Lake IOE-P PMC related maps
  platform/x86:intel/pmc: Use SSRAM to discover pwrm base address of primary PMC
  platform/x86:intel/pmc: Discover PMC devices
  platform/x86:intel/pmc: Enable debugfs multiple PMC support
  platform/x86:intel/pmc: Add support to handle multiple PMCs
  platform/x86:intel/pmc: Combine core_init() and core_configure()
  platform/x86:intel/pmc: Update maps for Meteor Lake P/M platforms
  platform/x86/intel: tpmi: Remove hardcoded unit and offset
  platform/x86: int3472: discrete: Log a warning if the pin-numbers don't match
  platform/x86: int3472: discrete: Use FIELD_GET() on the GPIO _DSM return value
  platform/x86: int3472: discrete: Add alternative "AVDD" regulator supply name
  platform/x86: int3472: discrete: Add support for 1 GPIO regulator shared between 2 sensors
  platform/x86: int3472: discrete: Remove sensor_config-s
  platform/x86: int3472: discrete: Drop GPIO remapping support
  platform/x86: apple-gmux: don't use be32_to_cpu and cpu_to_be32
  platform/x86/dell/dell-rbtn: Fix resources leaking on error path
  platform/x86: ISST: Fix usage counter
  platform/x86: ISST: Reset default callback on unregister
  platform/x86: int3472: Switch back to use struct i2c_driver's .probe()
  ...