Mike Christie [Mon, 26 Jun 2023 23:23:01 +0000 (18:23 -0500)]
vhost_scsi: flush IO vqs then send TMF rsp
With one worker we will always send the scsi cmd responses then send the
TMF rsp, because LIO will always complete the scsi cmds first then call
into us to send the TMF response.
With multiple workers, the IO vq workers could be running while the
TMF/ctl vq worker is running so this has us do a flush before completing
the TMF to make sure cmds are completed when it's work is later queued
and run.
Mike Christie [Mon, 26 Jun 2023 23:22:59 +0000 (18:22 -0500)]
vhost_scsi: make SCSI cmd completion per vq
This patch separates the scsi cmd completion code paths so we can complete
cmds based on their vq instead of having all cmds complete on the same
worker/CPU. This will be useful with the next patches that allow us to
create mulitple worker threads and bind them to different vqs, and we can
have completions running on different threads/CPUs.
Mike Christie [Mon, 26 Jun 2023 23:22:57 +0000 (18:22 -0500)]
vhost: convert poll work to be vq based
This has the drivers pass in their poll to vq mapping and then converts
the core poll code to use the vq based helpers. In the next patches we
will allow vqs to be handled by different workers, so to allow drivers
to execute operations like queue, stop, flush, etc on specific polls/vqs
we need to know the mappings.
Mike Christie [Mon, 26 Jun 2023 23:22:56 +0000 (18:22 -0500)]
vhost: take worker or vq for flushing
This patch has the core work flush function take a worker. When we
support multiple workers we can then flush each worker during device
removal, stoppage, etc. It also adds a helper to flush specific
virtqueues, so vhost-scsi can flush IO vqs from it's ctl vq.
Mike Christie [Mon, 26 Jun 2023 23:22:55 +0000 (18:22 -0500)]
vhost: take worker or vq instead of dev for queueing
This patch has the core work queueing function take a worker for when we
support multiple workers. It also adds a helper that takes a vq during
queueing so modules can control which vq/worker to queue work on.
This temp leaves vhost_work_queue. It will be removed when the drivers
are converted in the next patches.
Mike Christie [Mon, 26 Jun 2023 23:22:54 +0000 (18:22 -0500)]
vhost, vhost_net: add helper to check if vq has work
In the next patches each vq might have different workers so one could
have work but others do not. For net, we only want to check specific vqs,
so this adds a helper to check if a vq has work pending and converts
vhost-net to use it.
Mike Christie [Mon, 26 Jun 2023 23:22:53 +0000 (18:22 -0500)]
vhost: add vhost_worker pointer to vhost_virtqueue
This patchset allows userspace to map vqs to different workers. This
patch adds a worker pointer to the vq so in later patches in this set
we can queue/flush specific vqs and their workers.
Mike Christie [Mon, 26 Jun 2023 23:22:52 +0000 (18:22 -0500)]
vhost: dynamically allocate vhost_worker
This patchset allows us to allocate multiple workers, so this has us
move from the vhost_worker that's embedded in the vhost_dev to
dynamically allocating it.
Mike Christie [Mon, 26 Jun 2023 23:22:51 +0000 (18:22 -0500)]
vhost: create worker at end of vhost_dev_set_owner
vsock can start queueing work after VHOST_VSOCK_SET_GUEST_CID, so
after we have called vhost_worker_create it can be calling
vhost_work_queue and trying to access the vhost worker/task. If
vhost_dev_alloc_iovecs fails, then vhost_worker_free could free
the worker/task from under vsock.
This moves vhost_worker_create to the end of vhost_dev_set_owner
where we know we can no longer fail in that path. If it fails
after the VHOST_SET_OWNER and userspace closes the device, then
the normal vsock release handling will do the right thing.
Xianting Tian [Fri, 9 Jun 2023 13:18:17 +0000 (21:18 +0800)]
virtio_bt: call scheduler when we free unused buffs
For virtio-net we were getting CPU stall warnings, and fixed it by
calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall
when free_unused_bufs").
This driver is similar so theoretically the same logic applies.
Xianting Tian [Fri, 9 Jun 2023 13:18:16 +0000 (21:18 +0800)]
virtio-console: call scheduler when we free unused buffs
For virtio-net we were getting CPU stall warnings, and fixed it by
calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall
when free_unused_bufs").
This driver is similar so theoretically the same logic applies.
Xianting Tian [Fri, 9 Jun 2023 13:18:15 +0000 (21:18 +0800)]
virtio-crypto: call scheduler when we free unused buffs
For virtio-net we were getting CPU stall warnings, and fixed it by
calling the scheduler: see f8bb51043945 ("virtio_net: suppress cpu stall
when free_unused_bufs").
This driver is similar so theoretically the same logic applies.
ovl: move all parameter handling into params.{c,h}
While initially I thought that we couldn't move all new mount api
handling into params.{c,h} it turns out it is possible. So this just
moves a good chunk of code out of super.c and into params.{c,h}.
Daniel Thompson [Fri, 30 Jun 2023 20:12:06 +0000 (21:12 +0100)]
kdb: move kdb_send_sig() declaration to a better header file
kdb_send_sig() is defined in the signal code and called from kdb,
but the declaration is part of the kdb internal code.
Move the declaration to the shared header to avoid the warning:
kernel/signal.c:4789:6: error: no previous prototype for 'kdb_send_sig' [-Werror=missing-prototypes]
Eric Dumazet [Thu, 29 Jun 2023 16:41:50 +0000 (16:41 +0000)]
tcp: annotate data races in __tcp_oow_rate_limited()
request sockets are lockless, __tcp_oow_rate_limited() could be called
on the same object from different cpus. This is harmless.
Add READ_ONCE()/WRITE_ONCE() annotations to avoid a KCSAN report.
Fixes: 4ce7e93cb3fe ("tcp: rate limit ACK sent by SYN_RECV request sockets") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Mon, 3 Jul 2023 08:17:52 +0000 (09:17 +0100)]
Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:
====================
wireguard fixes for 6.4.2/6.5-rc1
Sorry to send these patches during the merge window, but they're net
fixes, not netdev enhancements, and while I'd ordinarily wait anyway,
I just got a first bug report for one of these fixes, which I originally
had thought was mostly unlikely. So please apply the following three
patches to net:
1) Make proper use of nr_cpu_ids with cpumask_next(), rather than
awkwardly using modulo, to handle dynamic CPU topology changes.
Linus noticed this a while ago and pointed it out, and today a user
actually got hit by it.
2) Respect persistent keepalive and other staged packets when setting
the private key after the interface is already up.
3) Use timer_delete_sync() instead of del_timer_sync(), per the
documentation.
====================
wireguard: netlink: send staged packets when setting initial private key
Packets bound for peers can queue up prior to the device private key
being set. For example, if persistent keepalive is set, a packet is
queued up to be sent as soon as the device comes up. However, if the
private key hasn't been set yet, the handshake message never sends, and
no timer is armed to retry, since that would be pointless.
But, if a user later sets a private key, the expectation is that those
queued packets, such as a persistent keepalive, are actually sent. So
adjust the configuration logic to account for this edge case, and add a
test case to make sure this works.
Maxim noticed this with a wg-quick(8) config to the tune of:
[Interface]
PostUp = wg set %i private-key somefile
wireguard: queueing: use saner cpu selection wrapping
Using `% nr_cpumask_bits` is slow and complicated, and not totally
robust toward dynamic changes to CPU topologies. Rather than storing the
next CPU in the round-robin, just store the last one, and also return
that value. This simplifies the loop drastically into a much more common
pattern.
Each sample script sources functions.sh before parameters.sh
which makes $APPEND undefined when trapping EXIT no matter in
append mode or not. Due to this when sample scripts finished
they always do "pgctrl reset" which resets pktgen config.
So move trap to each script after sourcing parameters.sh
and trap EXIT explicitly.
Daniel Díaz [Sat, 1 Jul 2023 04:41:03 +0000 (22:41 -0600)]
selftests/net: Add xt_policy config for xfrm_policy test
When running Kselftests with the current selftests/net/config
the following problem can be seen with the net:xfrm_policy.sh
selftest:
# selftests: net: xfrm_policy.sh
[ 41.076721] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 41.094787] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 41.107635] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
# modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
# iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
# Perhaps iptables or your kernel needs to be upgraded.
# modprobe: FATAL: Module ip_tables not found in directory /lib/modules/6.1.36
# iptables v1.8.7 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
# Perhaps iptables or your kernel needs to be upgraded.
# SKIP: Could not insert iptables rule
ok 1 selftests: net: xfrm_policy.sh # SKIP
This is because IPsec "policy" match support is not available
to the kernel.
This patch adds CONFIG_NETFILTER_XT_MATCH_POLICY as a module
to the selftests/net/config file, so that `make
kselftest-merge` can take this into consideration.
Fixes: 66e4c8d95008 ("net: warn if transport header was not set") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Fri, 30 Jun 2023 22:20:10 +0000 (01:20 +0300)]
net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge mode
There was a regression introduced by the blamed commit, where pinging to
a VLAN-unaware bridge would fail with the repeated message "Couldn't
decode source port" coming from the tagging protocol driver.
When receiving packets with a bridge_vid as determined by
dsa_tag_8021q_bridge_join(), dsa_8021q_rcv() will decode:
- source_port = 0 (which isn't really valid, more like "don't know")
- switch_id = 0 (which isn't really valid, more like "don't know")
- vbid = value in range 1-7
Since the blamed patch has reversed the order of the checks, we are now
going to believe that source_port != -1 and switch_id != -1, so they're
valid, but they aren't.
The minimal solution to the problem is to only populate source_port and
switch_id with what dsa_8021q_rcv() came up with, if the vbid is zero,
i.e. the source port information is trustworthy.
Fixes: c1ae02d87689 ("net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Vladimir Oltean [Fri, 30 Jun 2023 16:41:18 +0000 (19:41 +0300)]
net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
According to the synchronization rules for .ndo_get_stats() as seen in
Documentation/networking/netdevices.rst, acquiring a plain spin_lock()
should not be illegal, but the bridge driver implementation makes it so.
After running these commands, I am being faced with the following
lockdep splat:
$ ip link add link swp0 name macsec0 type macsec encrypt on && ip link set swp0 up
$ ip link add dev br0 type bridge vlan_filtering 1 && ip link set br0 up
$ ip link set macsec0 master br0 && ip link set macsec0 up
========================================================
WARNING: possible irq lock inversion dependency detected 6.4.0-04295-g31b577b4bd4a #603 Not tainted
--------------------------------------------------------
swapper/1/0 just changed the state of lock: ffff6bd348724cd8 (&br->lock){+.-.}-{3:3}, at: br_forward_delay_timer_expired+0x34/0x198
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&ocelot->stats_lock){+.+.}-{3:3}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:
&br->lock --> &br->hash_lock --> &ocelot->stats_lock
swp0 is instantiated by drivers/net/dsa/ocelot/felix.c, and this
only matters to the extent that its .ndo_get_stats64() method calls
spin_lock(&ocelot->stats_lock).
Documentation/locking/lockdep-design.rst says:
| A lock is irq-safe means it was ever used in an irq context, while a lock
| is irq-unsafe means it was ever acquired with irq enabled.
(...)
| Furthermore, the following usage based lock dependencies are not allowed
| between any two lock-classes::
|
| <hardirq-safe> -> <hardirq-unsafe>
| <softirq-safe> -> <softirq-unsafe>
Lockdep marks br->hash_lock as softirq-safe, because it is sometimes
taken in softirq context (for example br_fdb_update() which runs in
NET_RX softirq), and when it's not in softirq context it blocks softirqs
by using spin_lock_bh().
Lockdep marks ocelot->stats_lock as softirq-unsafe, because it never
blocks softirqs from running, and it is never taken from softirq
context. So it can always be interrupted by softirqs.
There is a call path through which a function that holds br->hash_lock:
fdb_add_hw_addr() will call a function that acquires ocelot->stats_lock:
ocelot_port_get_stats64(). This can be seen below:
The macsec0 bridge port is created without p->flags & BR_PROMISC,
because it is what br_manage_promisc() decides for a VLAN filtering
bridge with a single auto port.
As part of the br_add_if() procedure, br_fdb_add_local() is called for
the MAC address of the device, and this results in a call to
dev_uc_add() for macsec0 while the softirq-safe br->hash_lock is taken.
Because macsec0 does not have IFF_UNICAST_FLT, dev_uc_add() ends up
calling __dev_set_promiscuity() for macsec0, which is propagated by its
implementation, macsec_dev_change_rx_flags(), to the lower device: swp0.
This triggers the call path:
with a calling context that lockdep doesn't like (br->hash_lock held).
Normally we don't see this, because even though many drivers that can be
bridge ports don't support IFF_UNICAST_FLT, we need a driver that
(a) doesn't support IFF_UNICAST_FLT, *and*
(b) it forwards the IFF_PROMISC flag to another driver, and
(c) *that* driver implements ndo_get_stats64() using a softirq-unsafe
spinlock.
Condition (b) is necessary because the first __dev_set_rx_mode() calls
__dev_set_promiscuity() with "bool notify=false", and thus, the
rtmsg_ifinfo() code path won't be entered.
The same criteria also hold true for DSA switches which don't report
IFF_UNICAST_FLT. When the DSA master uses a spin_lock() in its
ndo_get_stats64() method, the same lockdep splat can be seen.
I think the deadlock possibility is real, even though I didn't reproduce
it, and I'm thinking of the following situation to support that claim:
fdb_add_hw_addr() runs on a CPU A, in a context with softirqs locally
disabled and br->hash_lock held, and may end up attempting to acquire
ocelot->stats_lock.
In parallel, ocelot->stats_lock is currently held by a thread B (say,
ocelot_check_stats_work()), which is interrupted while holding it by a
softirq which attempts to lock br->hash_lock.
Thread B cannot make progress because br->hash_lock is held by A. Whereas
thread A cannot make progress because ocelot->stats_lock is held by B.
When taking the issue at face value, the bridge can avoid that problem
by simply making the ports promiscuous from a code path with a saner
calling context (br->hash_lock not held). A bridge port without
IFF_UNICAST_FLT is going to become promiscuous as soon as we call
dev_uc_add() on it (which we do unconditionally), so why not be
preemptive and make it promiscuous right from the beginning, so as to
not be taken by surprise.
With this, we've broken the links between code that holds br->hash_lock
or br->lock and code that calls into the ndo_change_rx_flags() or
ndo_get_stats64() ops of the bridge port.
Fixes: 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Merge tag 'iomap-6.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
- Fix a type signature mismatch
- Drop Christoph as maintainer
* tag 'iomap-6.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: drop me [hch] from MAINTAINERS for iomap
fs: iomap: Change the type of blocksize from 'int' to 'unsigned int' in iomap_file_buffered_write_punch_delalloc
Merge tag 'v6.5/vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fix from Christian Brauner:
"A fix for the backing file work from this cycle.
When init_file() failed it would call file_free_rcu() on the file
allocated by the caller of init_file(). It naively assumed that the
correct cleanup operation would be called depending on whether it is a
regular file or a backing file. However, that presupposes that the
FMODE_BACKING flag would already be set which it won't be as that is
done in the caller of init_file().
Fix that bug by moving the cleanup of the allocated file into the
caller where it belongs in the first place. There's no good reason for
init_file() to consume resources it didn't allocate. This is a
mainline only fix and was reported by syzbot. The fix was validated by
syzbot against the provided reproducer"
* tag 'v6.5/vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: move cleanup from init_file() into its callers
Merge tag 'i2c-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
- I2C has now a co-maintainer taking care of the host drivers. Welcome
Andi Shyti and have fun!
- platform remove callback converted to return void in drivers
- simplify drivers by using devm_clk_get_enabled()
- introduce i2c_get_match_data() to avoid more boilerplate code
(especially since the core stopped delivering an i2c_device_id)
- and the usual bunch of driver updates
* tag 'i2c-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (38 commits)
i2c: uniphier: Use devm_clk_get_enabled()
i2c: uniphier-f: Use devm_clk_get_enabled()
i2c: owl: Use devm_clk_get_enabled()
i2c: lpc2k: Use devm_clk_get_enabled()
i2c: hix5hd2: Use devm_clk_get_enabled()
i2c: sun6i-p2wi: Use devm_clk_get_enabled()
i2c: pasemi-platform: Use devm_clk_get_enabled()
i2c: mt7621: Use devm_clk_get_enabled()
i2c: xiic: Use devm_clk_get_enabled()
i2c: davinci: Use platform table macro over module_alias
i2c: ocores: use devm_ managed clks
i2c: nomadik: Use dev_err_probe() whenever possible
i2c: nomadik: Use devm_clk_get_enabled()
i2c: nomadik: Remove unnecessary goto label
usb: typec: ucsi: Mark dGPUs as DEVICE scope
i2c: wmt: Use devm_platform_get_and_ioremap_resource()
i2c: versatile: Use devm_platform_get_and_ioremap_resource()
i2c: hix5hd2: Add I2C_M_STOP flag support for i2c-hix5hd2 driver.
i2c: mpc: Use of_property_read_reg() to parse "reg"
i2c: imx-lpi2c: Don't open-code DIV_ROUND_UP
...
Darrick J. Wong [Fri, 30 Jun 2023 00:39:46 +0000 (17:39 -0700)]
xfs: fix xfs_btree_query_range callers to initialize btree rec fully
Use struct initializers to ensure that the xfs_btree_irecs passed into
the query_range function are completely initialized. No functional
changes, just closing some sloppy hygiene.
Darrick J. Wong [Fri, 30 Jun 2023 00:39:45 +0000 (17:39 -0700)]
xfs: validate fsmap offsets specified in the query keys
Improve the validation of the fsmap offset fields in the query keys and
move the validation to the top of the function now that we have pushed
the low key adjustment code downwards.
Also fix some indenting issues that aren't worth a separate patch.
Darrick J. Wong [Fri, 30 Jun 2023 00:39:45 +0000 (17:39 -0700)]
xfs: fix logdev fsmap query result filtering
The external log device fsmap backend doesn't have an rmapbt to query,
so it's wasteful to spend time initializing the rmap_irec objects.
Worse yet, the log could (someday) be longer than 2^32 fsblocks, so
using the rmap irec structure will result in integer overflows.
Fix this mess by computing the start address that we want from keys[0]
directly, and use the daddr-based record filtering algorithm that we
also use for rtbitmap queries.
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
Darrick J. Wong [Fri, 30 Jun 2023 00:39:44 +0000 (17:39 -0700)]
xfs: clean up the rtbitmap fsmap backend
The rtbitmap fsmap backend doesn't query the rmapbt, so it's wasteful to
spend time initializing the rmap_irec objects. Worse yet, the logic to
query the rtbitmap is spread across three separate functions, which is
unnecessarily difficult to follow.
Compute the start rtextent that we want from keys[0] directly and
combine the functions to avoid passing parameters around everywhere, and
consolidate all the logic into a single function. At one point many
years ago I intended to use __xfs_getfsmap_rtdev as the launching point
for realtime rmapbt queries, but this hasn't been the case for a long
time.
Darrick J. Wong [Fri, 30 Jun 2023 00:39:44 +0000 (17:39 -0700)]
xfs: fix getfsmap reporting past the last rt extent
The realtime section ends at the last rt extent. If the user configures
the rt geometry with an extent size that is not an integer factor of the
number of rt blocks, it's possible for there to be rt blocks past the
end of the last rt extent. These tail blocks cannot ever be allocated
and will cause corruption reports if the last extent coincides with the
end of an rt bitmap block, so do not report consider them for the
GETFSMAP output.
Darrick J. Wong [Fri, 30 Jun 2023 00:39:43 +0000 (17:39 -0700)]
xfs: fix integer overflows in the fsmap rtbitmap and logdev backends
It's not correct to use the rmap irec structure to hold query key
information to query the rtbitmap because the realtime volume can be
longer than 2^32 fsblocks in length. Because the rt volume doesn't have
allocation groups, introduce a daddr-based record filtering algorithm
and compute the rtextent values using 64-bit variables. The same
problem exists in the external log device fsmap implementation, so use
the same solution to fix it too.
After this patch, all the code that touches info->low and info->high
under xfs_getfsmap_logdev and __xfs_getfsmap_rtdev are unnecessary.
Cleaning this up will be done in subsequent patches.
Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
That's not right -- we asked what block maps block 208, and we should've
received a mapping for inode 137 offset 16. Instead, we get nothing.
The root cause of this problem is a mis-interaction between the fsmap
code and how btree ranged queries work. xfs_btree_query_range returns
any btree record that overlaps with the query interval, even if the
record starts before or ends after the interval. Similarly, GETFSMAP is
supposed to return a recordset containing all records that overlap the
range queried.
However, it's possible that the recordset is larger than the buffer that
the caller provided to convey mappings to userspace. In /that/ case,
userspace is supposed to copy the last record returned to fmh_keys[0]
and call GETFSMAP again. In this case, we do not want to return
mappings that we have already supplied to the caller. The call to
xfs_btree_query_range is the same, but now we ignore any records that
start before fmh_keys[0].
Unfortunately, we didn't implement the filtering predicate correctly.
The predicate should only be called when we're calling back for more
records. Accomplish this by setting info->low.rm_blockcount to a
nonzero value and ensuring that it is cleared as necessary. As a
result, we no longer want to adjust dkeys[0] in the main setup function
because that's confusing.
This patch doesn't touch the logdev/rtbitmap backends because they have
bigger problems that will be addressed by subsequent patches.
Found via xfs/556 with parent pointers enabled.
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:45 +0000 (11:58 +0530)]
octeontx2-af: Reset MAC features in FLR
AF driver configures MAC features like internal loopback and PFC upon
receiving the request from PF and its VF netdev. But these
features are not getting reset in FLR. This patch fixes the issue by
resetting the same.
Fixes: 23999b30ae67 ("octeontx2-af: Enable or disable CGX internal loopback") Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam <[email protected]> Signed-off-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:44 +0000 (11:58 +0530)]
octeontx2-af: Add validation before accessing cgx and lmac
with the addition of new MAC blocks like CN10K RPM and CN10KB
RPM_USX, LMACs are noncontiguous and CGX blocks are also
noncontiguous. But during RVU driver initialization, the driver
is assuming they are contiguous and trying to access
cgx or lmac with their id which is resulting in kernel panic.
This patch fixes the issue by adding proper checks.
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:43 +0000 (11:58 +0530)]
octeontx2-af: Fix mapping for NIX block from CGX connection
Firmware configures NIX block mapping for all MAC blocks.
The current implementation reads the configuration and
creates the mapping between RVU PF and NIX blocks. But
this configuration is only valid for silicons that support
multiple blocks. For all other silicons, all MAC blocks
map to NIX0.
This patch corrects the mapping by adding a check for the same.
Hariprasad Kelam [Fri, 30 Jun 2023 06:28:42 +0000 (11:58 +0530)]
octeontx2-af: cn10kb: fix interrupt csr addresses
The current design is that, for asynchronous events like link_up and
link_down firmware raises the interrupt to kernel. The previous patch
which added RPM_USX driver has a bug where it uses old csr addresses
for configuring interrupts. Which is resulting in losing interrupts
from source firmware.
This patch fixes the issue by correcting csr addresses.
David Howells [Thu, 29 Jun 2023 21:47:53 +0000 (22:47 +0100)]
nvme-tcp: Fix comma-related oops
Fix a comma that should be a semicolon. The comma is at the end of an
if-body and thus makes the statement after (a bvec_set_page()) conditional
too, resulting in an oops because we didn't fill out the bio_vec[]:
Merge tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Stable fixes and other bugfixes:
- nfs: don't report STATX_BTIME in ->getattr
- Revert 'NFSv4: Retry LOCK on OLD_STATEID during delegation return'
since it breaks NFSv4 state recovery.
- NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
- Fix the NFSv4.2 xattr cache shrinker_id
- Force a ctime update after a NFSv4.2 SETXATTR call
Features and cleanups:
- NFS and RPC over TLS client code from Chuck Lever
- Support for use of abstract unix socket addresses with the rpcbind
daemon
- Sysfs API to allow shutdown of the kernel RPC client and prevent
umount() hangs if the server is known to be permanently down
- XDR cleanups from Anna"
* tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (33 commits)
Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
NFS: Don't cleanup sysfs superblock entry if uninitialized
nfs: don't report STATX_BTIME in ->getattr
NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
NFSv4.2: fix wrong shrinker_id
NFSv4: Clean up some shutdown loops
NFS: Cancel all existing RPC tasks when shutdown
NFS: add sysfs shutdown knob
NFS: add a sysfs link to the acl rpc_client
NFS: add a sysfs link to the lockd rpc_client
NFS: Add sysfs links to sunrpc clients for nfs_clients
NFS: add superblock sysfs entries
NFS: Make all of /sys/fs/nfs network-namespace unique
NFS: Open-code the nfs_kset kset_create_and_add()
NFS: rename nfs_client_kobj to nfs_net_kobj
NFS: rename nfs_client_kset to nfs_kset
NFS: Add an "xprtsec=" NFS mount option
NFS: Have struct nfs_client carry a TLS policy field
SUNRPC: Add a TCP-with-TLS RPC transport class
SUNRPC: Capture CMSG metadata on client-side receive
...
Merge tag 'x86-urgent-2023-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single regression fix for x86:
Moving the invocation of arch_cpu_finalize_init() earlier in the boot
process caused a boot regression on IBT enabled system.
The root cause is not the move of arch_cpu_finalize_init() itself. The
system fails to boot because the subsequent efi_enter_virtual_mode()
code has a non-IBT safe EFI call inside. This was not noticed before
because IBT was enabled after the EFI initialization.
Switching the EFI call to use the IBT safe wrapper cures the problem"
* tag 'x86-urgent-2023-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Make efi_set_virtual_address_map IBT safe
Merge tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Remove the deprecated rule to build *.dtbo from *.dts
- Refactor section mismatch detection in modpost
- Fix bogus ARM section mismatch detections
- Fix error of 'make gtags' with O= option
- Add Clang's target triple to KBUILD_CPPFLAGS to fix a build error
with the latest LLVM version
- Rebuild the built-in initrd when KBUILD_BUILD_TIMESTAMP is changed
- Ignore more compiler-generated symbols for kallsyms
- Fix 'make local*config' to handle the ${CONFIG_FOO} form in Makefiles
- Enable more kernel-doc warnings with W=2
- Refactor <linux/export.h> by generating KSYMTAB data by modpost
- Deprecate <asm/export.h> and <asm-generic/export.h>
- Remove the EXPORT_DATA_SYMBOL macro
- Move the check for static EXPORT_SYMBOL back to modpost, which makes
the build faster
- Re-implement CONFIG_TRIM_UNUSED_KSYMS with one-pass algorithm
- Warn missing MODULE_DESCRIPTION when building modules with W=1
- Make 'make clean' robust against too long argument error
- Exclude more objects from GCOV to fix CFI failures with GCOV
- Allow 'make modules_install' to install modules.builtin and
modules.builtin.modinfo even when CONFIG_MODULES is disabled
- Include modules.builtin and modules.builtin.modinfo in the
linux-image Debian package even when CONFIG_MODULES is disabled
- Revive "Entering directory" logging for the latest Make version
* tag 'kbuild-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (72 commits)
modpost: define more R_ARM_* for old distributions
kbuild: revive "Entering directory" for Make >= 4.4.1
kbuild: set correct abs_srctree and abs_objtree for package builds
scripts/mksysmap: Ignore prefixed KCFI symbols
kbuild: deb-pkg: remove the CONFIG_MODULES check in buildeb
kbuild: builddeb: always make modules_install, to install modules.builtin*
modpost: continue even with unknown relocation type
modpost: factor out Elf_Sym pointer calculation to section_rel()
modpost: factor out inst location calculation to section_rel()
kbuild: Disable GCOV for *.mod.o
kbuild: Fix CFI failures with GCOV
kbuild: make clean rule robust against too long argument error
script: modpost: emit a warning when the description is missing
kbuild: make modules_install copy modules.builtin(.modinfo)
linux/export.h: rename 'sec' argument to 'license'
modpost: show offset from symbol for section mismatch warnings
modpost: merge two similar section mismatch warnings
kbuild: implement CONFIG_TRIM_UNUSED_KSYMS without recursion
modpost: use null string instead of NULL pointer for default namespace
modpost: squash sym_update_namespace() into sym_add_exported()
...
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix memory corruption (overwriting the kmalloc redzone) when saving
the SVE state while in SVE streaming mode"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: sme: Use STR P to clear FFR context field in streaming SVE mode
Merge tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dan Williams:
"The highlights in terms of new functionality are support for the
standard CXL Performance Monitor definition that appeared in CXL 3.0,
support for device sanitization (wiping all data from a device),
secure-erase (re-keying encryption of user data), and support for
firmware update. The firmware update support is notable as it reuses
the simple sysfs_upload interface to just cat(1) a blob to a sysfs
file and pipe that to the device.
Additionally there are a substantial number of cleanups and
reorganizations to get ready for RCH error handling (RCH == Restricted
CXL Host == current shipping hardware generation / pre CXL-2.0
topologies) and type-2 (accelerator / vendor specific) devices.
For vendor specific devices they implement a subset of what the
generic type-3 (generic memory expander) driver expects. As a result
the rework decouples optional infrastructure from the core driver
context.
For RCH topologies, where the specification working group did not want
to confuse pre-CXL-aware operating systems, many of the standard
registers are hidden which makes support standard bus features like
AER (PCIe Advanced Error Reporting) difficult. The rework arranges for
the driver to help the PCI-AER core. Bjorn is on board with this
direction but a late regression disocvery means the completion of this
functionality needs to cook a bit longer, so it is code
reorganizations only for now.
Summary:
- Add infrastructure for supporting background commands along with
support for device sanitization and firmware update
- Introduce a CXL performance monitoring unit driver based on the
common definition in the specification.
- Land some preparatory cleanup and refactoring for the anticipated
arrival of CXL type-2 (accelerator devices) and CXL RCH (CXL-v1.1
topology) error handling.
- Rework CPU cache management with respect to region configuration
(device hotplug or other dynamic changes to memory interleaving)
- Fix region reconfiguration vs CXL decoder ordering rules"
* tag 'cxl-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (51 commits)
cxl: Fix one kernel-doc comment
cxl/pci: Use correct flag for sanitize polling
docs: perf: Minimal introduction the the CXL PMU device and driver
perf: CXL Performance Monitoring Unit driver
tools/testing/cxl: add firmware update emulation to CXL memdevs
tools/testing/cxl: Use named effects for the Command Effect Log
tools/testing/cxl: Fix command effects for inject/clear poison
cxl: add a firmware update mechanism using the sysfs firmware loader
cxl/test: Add Secure Erase opcode support
cxl/mem: Support Secure Erase
cxl/test: Add Sanitize opcode support
cxl/mem: Wire up Sanitization support
cxl/mbox: Add sanitization handling machinery
cxl/mem: Introduce security state sysfs file
cxl/mbox: Allow for IRQ_NONE case in the isr
Revert "cxl/port: Enable the HDM decoder capability for switch ports"
cxl/memdev: Formalize endpoint port linkage
cxl/pci: Unconditionally unmask 256B Flit errors
cxl/region: Manage decoder target_type at decoder-attach time
cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEM
...
Merge tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm and DAX updates from Vishal Verma:
"This is mostly small cleanups and fixes, with the biggest change being
the change to the DAX fault handler allowing it to return
VM_FAULT_HWPOISON.
Summary:
- DAX fixes and cleanups including a use after free, extra
references, and device unregistration, and a redundant variable.
- Allow the DAX fault handler to return VM_FAULT_HWPOISON
- A few libnvdimm cleanups such as making some functions and
variables static where sufficient.
- Add a few missing prototypes for wrapped functions in
tools/testing/nvdimm"
* tag 'libnvdimm-for-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: enable dax fault handler to report VM_FAULT_HWPOISON
nvdimm: make security_show static
nvdimm: make nd_class variable static
dax/kmem: Pass valid argument to memory_group_register_static
fsdax: remove redundant variable 'error'
dax: Cleanup extra dax_region references
dax: Introduce alloc_dev_dax_id()
dax: Use device_unregister() in unregister_dax_mapping()
dax: Fix dax_mapping_release() use after free
tools/testing/nvdimm: Drop empty platform remove function
libnvdimm: mark 'security_show' static again
testing: nvdimm: add missing prototypes for wrapped functions
dax: fix missing-prototype warnings
Merge tag 'sysctl-fixes-v2-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull another sysctl fix from Luis Chamberlain:
"Just one minor nit I forgot to merge"
* tag 'sysctl-fixes-v2-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: set variable sysctl_mount_point storage-class-specifier to static
Merge tag 'flex-array-transformations-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array update from Gustavo Silva:
"Transform a zero-length array into a C99 flexible-array member.
This addresses a build failure with Clang by fixing multiple
'-Warray-bounds' warnings in drivers/staging/ks7010/ks_wlan_net.c"
* tag 'flex-array-transformations-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
uapi: wireless: Replace zero-length array with flexible-array member
Before commit d67790ddf021 ("overflow: Add struct_size_t() helper") only
struct_size() existed, which expects a valid pointer instance containing
the flexible array.
However, when we determine the default struct pid allocation size for
the associated kmem cache of a pid namespace we need to take the nesting
depth of the pid namespace into account without an variable instance
necessarily being available.
In commit b69f0aeb0689 ("pid: Replace struct pid 1-element array with
flex-array") we used to handle this the old fashioned way and cast NULL
to a struct pid pointer type. However, we do apparently have a dedicated
struct_size_t() helper for exactly this case. So switch to that.
Liam R. Howlett [Fri, 30 Jun 2023 02:28:16 +0000 (22:28 -0400)]
mm: Update do_vmi_align_munmap() return semantics
Since do_vmi_align_munmap() will always honor the downgrade request on
the success, the callers no longer have to deal with confusing return
codes. Since all callers that request downgrade actually want the lock
to be dropped, change the downgrade to an unlock request.
Note that the lock still needs to be held in read mode during the page
table clean up to avoid races with a map request.
Update do_vmi_align_munmap() to return 0 for success. Clean up the
callers and comments to always expect the unlock to be honored on the
success path. The error path will always leave the lock untouched.
As part of the cleanup, the wrapper function do_vmi_munmap() and callers
to the wrapper are also updated.
Now that stack growth must always hold the mmap_lock for write, we can
always downgrade the mmap_lock to read and safely unmap pages from the
page table, even if we're next to a stack.
Max Filippov [Sat, 1 Jul 2023 10:31:55 +0000 (03:31 -0700)]
xtensa: fix lock_mm_and_find_vma in case VMA not found
MMU version of lock_mm_and_find_vma releases the mm lock before
returning when VMA is not found. Do the same in noMMU version.
This fixes hang on an attempt to handle protection fault.
Fixes: d85a143b69ab ("xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion") Signed-off-by: Max Filippov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
docs: networking: Update codeaurora references for rmnet
source.codeaurora.org is no longer accessible and so the reference link
in the documentation is not useful. Use iproute2 instead as it has a
rmnet module for configuration.
Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Sean Tranchetti <[email protected]> Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The new LARA-R6 product variant identified by the "01B" string can be
configured (by AT interface) in three different USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial
interface and 1 CDC-ECM virtual network interface
The first 4 interfaces of all the 3 configurations (default, RmNet, ECM)
are the same.
In CDC-ECM mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: CDC-ECM interface
Paolo Bonzini [Sat, 1 Jul 2023 11:04:29 +0000 (07:04 -0400)]
Merge tag 'kvmarm-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.5
- Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of block splitting in the stage-2
fault path.
- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact with
services that live in the Secure world. pKVM intervenes on FF-A calls
to guarantee the host doesn't misuse memory donated to the hyp or a
pKVM guest.
- Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
- Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set configuration
from userspace, but the intent is to relax this limitation and allow
userspace to select a feature set consistent with the CPU.
- Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
- Use a separate set of pointer authentication keys for the hypervisor
when running in protected mode, as the host is untrusted at runtime.
- Ensure timer IRQs are consistently released in the init failure
paths.
- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization Traps
(FEAT_EVT), as it is a register commonly read from userspace.
- Erratum workaround for the upcoming AmpereOne part, which has broken
hardware A/D state management.
As a consequence of the hVHE series reworking the arm64 software
features framework, the for-next/module-alloc branch from the arm64 tree
comes along for the ride.
Merge tag '6.5-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- Deferred close fix
- Debugging improvements: display missing mount option, dump rc on
invalidate inode failures, print client_guid in DebugData, log
session id when matching session not found in reconnect, new dynamic
tracepoint for session not found
- Mount fixes including: potential null dereference, and possible
memory leak and path name parsing when double slashes
- Fix potential use after free in compounding
- Two crediting (flow control) fixes: fix for crediting leak (stress
scenario with excess lease credits) and better locking around
updating credits
- Three cleanups from issues pointed out by the kernel test robot
- Session state check improvements (including for potential use after
free)
- DFS fixes: Fix for getattr on link when DFS disabled, fix for DFS
mounts to same share with different prefix paths, DFS mount error
checking improvement
* tag '6.5-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6:
cifs: new dynamic tracepoint to track ses not found errors
cifs: log session id when a matching ses is not found
smb: client: improve DFS mount check
smb: client: fix shared DFS root mounts with different prefixes
smb: client: fix parsing of source mount option
smb: client: fix broken file attrs with nodfs mounts
cifs: print client_guid in DebugData
cifs: fix session state check in smb2_find_smb_ses
cifs: fix session state check in reconnect to avoid use-after-free issue
cifs: do all necessary checks for credits within or before locking
cifs: prevent use-after-free by freeing the cfile later
smb: client: fix warning in generic_ip_connect()
smb: client: fix warning in CIFSFindNext()
smb: client: fix warning in CIFSFindFirst()
smb3: do not reserve too many oplock credits
cifs: print more detail when invalidate_inode_mapping fails
smb: client: fix warning in cifs_smb3_do_mount()
smb: client: fix warning in cifs_match_super()
cifs: print nosharesock value while dumping mount options
SMB3: Do not send lease break acknowledgment if all file handles have been closed
Merge tag '6.5-rc-ksmbd-server-fixes-part1' of git://git.samba.org/ksmbd
Pull ksmbd server updates from Steve French:
- two fixes for compounding bugs (make sure no out of bound reads with
less common combinations of commands in the compound)
- eight minor cleanup patches (e.g. simplifying return values, replace
one element array, use of kzalloc where simpler)
- fix for clang warning on possible overflow in filename conversion
* tag '6.5-rc-ksmbd-server-fixes-part1' of git://git.samba.org/ksmbd:
ksmbd: avoid field overflow warning
ksmbd: Replace one-element array with flexible-array member
ksmbd: Use struct_size() helper in ksmbd_negotiate_smb_dialect()
ksmbd: add missing compound request handing in some commands
ksmbd: fix out of bounds read in smb2_sess_setup
ksmbd: Replace the ternary conditional operator with min()
ksmbd: use kvzalloc instead of kvmalloc
ksmbd: Change the return value of ksmbd_vfs_query_maximal_access to void
ksmbd: return a literal instead of 'err' in ksmbd_vfs_kern_path_locked()
ksmbd: use kzalloc() instead of __GFP_ZERO
ksmbd: remove unused ksmbd_tree_conn_share function
Merge tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Although some more stuff is brewing, the EFI changes that are ready
for mainline are few this cycle:
- improve the PCI DMA paranoia logic in the EFI stub
- some constification changes
- add statfs support to efivarfs
- allow user space to enumerate updatable firmware resources without
CAP_SYS_ADMIN"
* tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/libstub: Disable PCI DMA before grabbing the EFI memory map
efi/esrt: Allow ESRT access without CAP_SYS_ADMIN
efivarfs: expose used and total size
efi: make kobj_type structure constant
efi: x86: make kobj_type structure constant
Merge tag 'v6.5-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Add linear akcipher/sig API
- Add tfm cloning (hmac, cmac)
- Add statesize to crypto_ahash
Algorithms:
- Allow only odd e and restrict value in FIPS mode for RSA
- Replace LFSR with SHA3-256 in jitter
- Add interface for gathering of raw entropy in jitter
Drivers:
- Fix race on data_avail and actual data in hwrng/virtio
- Add hash and HMAC support in starfive
- Add RSA algo support in starfive
- Add support for PCI device 0x156E in ccp"
* tag 'v6.5-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (85 commits)
crypto: akcipher - Do not copy dst if it is NULL
crypto: sig - Fix verify call
crypto: akcipher - Set request tfm on sync path
crypto: sm2 - Provide sm2_compute_z_digest when sm2 is disabled
hwrng: imx-rngc - switch to DEFINE_SIMPLE_DEV_PM_OPS
hwrng: st - keep clock enabled while hwrng is registered
hwrng: st - support compile-testing
hwrng: imx-rngc - fix the timeout for init and self check
KEYS: asymmetric: Use new crypto interface without scatterlists
KEYS: asymmetric: Move sm2 code into x509_public_key
KEYS: Add forward declaration in asymmetric-parser.h
crypto: sig - Add interface for sign/verify
crypto: akcipher - Add sync interface without SG lists
crypto: cipher - On clone do crypto_mod_get()
crypto: api - Add __crypto_alloc_tfmgfp
crypto: api - Remove crypto_init_ops()
crypto: rsa - allow only odd e and restrict value in FIPS mode
crypto: geniv - Split geniv out of AEAD Kconfig option
crypto: algboss - Add missing dependency on RNG2
crypto: starfive - Add RSA algo support
...
xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion
It turns out that xtensa has a really odd configuration situation: you
can do a no-MMU config, but still have the page fault code enabled.
Which doesn't sound all that sensible, but it turns out that xtensa can
have protection faults even without the MMU, and we have this:
config PFAULT
bool "Handle protection faults" if EXPERT && !MMU
default y
help
Handle protection faults. MMU configurations must enable it.
noMMU configurations may disable it if used memory map never
generates protection faults or faults are always fatal.
If unsure, say Y.
which completely violated my expectations of the page fault handling.
End result: Guenter reports that the xtensa no-MMU builds all fail with
arch/xtensa/mm/fault.c: In function ‘do_page_fault’:
arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’
because I never exposed the new lock_mm_and_find_vma() function for the
no-MMU case.
Chao Yu [Thu, 29 Jun 2023 11:11:44 +0000 (19:11 +0800)]
f2fs: fix to do sanity check on direct node in truncate_dnode()
syzbot reports below bug:
BUG: KASAN: slab-use-after-free in f2fs_truncate_data_blocks_range+0x122a/0x14c0 fs/f2fs/file.c:574
Read of size 4 at addr ffff88802a25c000 by task syz-executor148/5000
The root cause is, inodeA references inodeB via inodeB's ino, once inodeA
is truncated, it calls truncate_dnode() to truncate data blocks in inodeB's
node page, it traverse mapping data from node->i.i_addr[0] to
node->i.i_addr[ADDRS_PER_BLOCK() - 1], result in out-of-boundary access.
This patch fixes to add sanity check on dnode page in truncate_dnode(),
so that, it can help to avoid triggering such issue, and once it encounters
such issue, it will record newly introduced ERROR_INVALID_NODE_REFERENCE
error into superblock, later fsck can detect such issue and try repairing.
Also, it removes f2fs_truncate_data_blocks() for cleanup due to the
function has only one caller, and uses f2fs_truncate_data_blocks_range()
instead.
Sheng Yong [Tue, 27 Jun 2023 12:21:53 +0000 (20:21 +0800)]
f2fs: only set release for file that has compressed data
If a file is not comprssed yet or does not have compressed data,
for example, its data has a very low compression ratio, do not
set FI_COMPRESS_RELEASED flag.
Chao Yu [Thu, 29 Jun 2023 01:41:34 +0000 (09:41 +0800)]
f2fs: fix compile warning in f2fs_destroy_node_manager()
fs/f2fs/node.c: In function ‘f2fs_destroy_node_manager’:
fs/f2fs/node.c:3390:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=]
3390 | }
Merging below pointer arrays into common one, and reuse it by cast type.
Jason Baron [Fri, 23 Jun 2023 18:05:23 +0000 (14:05 -0400)]
md/raid0: add discard support for the 'original' layout
We've found that using raid0 with the 'original' layout and discard
enabled with different disk sizes (such that at least two zones are
created) can result in data corruption. This is due to the fact that
the discard handling in 'raid0_handle_discard()' assumes the 'alternate'
layout. We've seen this corruption using ext4 but other filesystems are
likely susceptible as well.
More specifically, while multiple zones are necessary to create the
corruption, the corruption may not occur with multiple zones if they
layout in such a way the layout matches what the 'alternate' layout
would have produced. Thus, not all raid0 devices with the 'original'
layout, different size disks and discard enabled will encounter this
corruption.
The 3.14 kernel inadvertently changed the raid0 disk layout for different
size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the
same raid0 array could corrupt data. This lead to the creation of the
'original' layout (to match the pre-3.14 layout) and the 'alternate' layout
(to match the post 3.14 layout) in the 5.4 kernel time frame and an option
to tell the kernel which layout to use (since it couldn't be autodetected).
However, when the 'original' layout was added back to 5.4 discard support
for the 'original' layout was not added leading this issue.
I've been able to reliably reproduce the corruption with the following
test case:
1. create raid0 array with different size disks using original layout
2. mkfs
3. mount -o discard
4. create lots of files
5. remove 1/2 the files
6. fstrim -a (or just the mount point for the raid0 array)
7. umount
8. fsck -fn /dev/md0 (spews all sorts of corruptions)
Let's fix this by adding proper discard support to the 'original' layout.
The fix 'maps' the 'original' layout disks to the order in which they are
read/written such that we can compare the disks in the same way that the
current 'alternate' layout does. A 'disk_shift' field is added to
'struct strip_zone'. This could be computed on the fly in
raid0_handle_discard() but by adding this field, we save some computation
in the discard path.
Note we could also potentially fix this by re-ordering the disks in the
zones that follow the first one, and then always read/writing them using
the 'alternate' layout. However, that is seen as a more substantial change,
and we are attempting the least invasive fix at this time to remedy the
corruption.
I've verified the change using the reproducer mentioned above. Typically,
the corruption is seen after less than 3 iterations, while the patch has
run 500+ iterations.
Nishanth Menon [Wed, 21 Jun 2023 01:00:22 +0000 (20:00 -0500)]
mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0
Sec proxy/message manager data buffer is 60 bytes with the last of the
registers indicating transmission completion. This however poses a bit
of a challenge.
The backing memory for sec_proxy / message manager is regular memory,
and all sec proxy does is to trigger a burst of all 60 bytes of data
over to the target thread backing ring accelerator. It doesn't do a
memory scrub when it moves data out in the burst. When we transmit
multiple messages, remnants of previous message is also transmitted
which results in some random data being set in TISCI fields of
messages that have been expanded forward.
The entire concept of backward compatibility hinges on the fact that
the unused message fields remain 0x0 allowing for 0x0 value to be
specially considered when backward compatibility of message extension
is done.
So, instead of just writing the completion register, we continue
to fill the message buffer up with 0x0 (note: for partial message
involving completion, we already do this).
This allows us to scale and introduce ABI changes back also work with
other boot stages that may have left data in the internal memory.
While at this, be consistent and explicit with the data_reg pointer
increment.
Add the compatible string for the HSP block found on the Tegra264 SoC.
The HSP block in Tegra264 is not register compatible with the one in
Tegra194 or Tegra234 hence there is no fallback compatibility string.
Linus Torvalds [Fri, 30 Jun 2023 22:22:09 +0000 (15:22 -0700)]
Merge tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Adjust log levels for common messages (Oleksandr Natalenko, Alex
Williamson)
- Support for dynamic MSI-X allocation (Reinette Chatre)
- Enable and report PCIe AtomicOp Completer capabilities (Alex
Williamson)
- Cleanup Kconfigs for vfio bus drivers (Alex Williamson)
- Add support for CDX bus based devices (Nipun Gupta)
- Fix race with concurrent mdev initialization (Eric Farman)
* tag 'vfio-v6.5-rc1' of https://github.com/awilliam/linux-vfio:
vfio/mdev: Move the compat_class initialization to module init
vfio/cdx: add support for CDX bus
vfio/fsl: Create Kconfig sub-menu
vfio/platform: Cleanup Kconfig
vfio/pci: Cleanup Kconfig
vfio/pci-core: Add capability for AtomicOp completer support
vfio/pci: Also demote hiding standard cap messages
vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X
vfio/pci: Support dynamic MSI-X
vfio/pci: Probe and store ability to support dynamic MSI-X
vfio/pci: Use bitfield for struct vfio_pci_core_device flags
vfio/pci: Update stale comment
vfio/pci: Remove interrupt context counter
vfio/pci: Use xarray for interrupt context storage
vfio/pci: Move to single error path
vfio/pci: Prepare for dynamic interrupt context storage
vfio/pci: Remove negative check on unsigned vector
vfio/pci: Consolidate irq cleanup on MSI/MSI-X disable
vfio/pci: demote hiding ecap messages to debug level
Linus Torvalds [Fri, 30 Jun 2023 21:50:00 +0000 (14:50 -0700)]
Merge tag 'platform-drivers-x86-v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"AMD PMC and PMF drivers:
- Various bugfixes
- Improved debugging support
Intel PMC:
- Refactor to support hw with multiple PMCs
- Various other improvements / new hw support
Intel Speed Select Technology (ISST):
- TPMI Uncore Frequency + Cluster Level Power Controls
- Various bugfixes
- tools/intel-speed-select: Misc improvements
Dell-DDV: Add documentation
INT3472 ACPI camera sensor glue code:
- Evaluate device's _DSM method to control imaging clock
- Drop the need to have a table with per sensor-model info
Lenovo Yogabook:
- Refactor / rework to also support Android models
Think-LMI:
- Multiple improvements and fixes
WMI:
- Add proper API documentation for the WMI bus
x86-android-tablets:
- Misc new hw support
Miscellaneous other cleanups / fixes"
* tag 'platform-drivers-x86-v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (91 commits)
platform/x86:intel/pmc: Add Meteor Lake IOE-M PMC related maps
platform/x86:intel/pmc: Add Meteor Lake IOE-P PMC related maps
platform/x86:intel/pmc: Use SSRAM to discover pwrm base address of primary PMC
platform/x86:intel/pmc: Discover PMC devices
platform/x86:intel/pmc: Enable debugfs multiple PMC support
platform/x86:intel/pmc: Add support to handle multiple PMCs
platform/x86:intel/pmc: Combine core_init() and core_configure()
platform/x86:intel/pmc: Update maps for Meteor Lake P/M platforms
platform/x86/intel: tpmi: Remove hardcoded unit and offset
platform/x86: int3472: discrete: Log a warning if the pin-numbers don't match
platform/x86: int3472: discrete: Use FIELD_GET() on the GPIO _DSM return value
platform/x86: int3472: discrete: Add alternative "AVDD" regulator supply name
platform/x86: int3472: discrete: Add support for 1 GPIO regulator shared between 2 sensors
platform/x86: int3472: discrete: Remove sensor_config-s
platform/x86: int3472: discrete: Drop GPIO remapping support
platform/x86: apple-gmux: don't use be32_to_cpu and cpu_to_be32
platform/x86/dell/dell-rbtn: Fix resources leaking on error path
platform/x86: ISST: Fix usage counter
platform/x86: ISST: Reset default callback on unregister
platform/x86: int3472: Switch back to use struct i2c_driver's .probe()
...