Takashi Iwai [Tue, 16 Aug 2022 13:07:55 +0000 (15:07 +0200)]
Merge tag 'asoc-fix-v6.0-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.0
A relatively large batch of fixes that came in since my pull request,
none of them too major and mostly device specific apart from a series of
security/robustness improvements from Takashi.
Sergey Gorenko [Fri, 5 Aug 2022 06:01:35 +0000 (09:01 +0300)]
IB/iser: Fix login with authentication
The iSER Initiator uses two types of receive buffers:
- one big login buffer posted by iser_post_recvl();
- several small message buffers posted by iser_post_recvm().
The login buffer is used at the login phase and full feature phase in
the discovery session. It may take a few requests and responses to
complete the login phase. The message buffers are only used in the
normal operational session at the full feature phase.
After the commit referred in the fixes line, the login operation fails
if the authentication is enabled. That happens because the Initiator
posts a small receive buffer after the first response from Target. So,
the next send operation fails because Target's second response does not
fit into the small receive buffer.
This commit adds additional checks to prevent posting small receive
buffers until the full feature phase.
ZiyangZhang [Mon, 15 Aug 2022 02:36:33 +0000 (10:36 +0800)]
ublk_drv: do not add a re-issued request aborted previously to ioucmd's task_work
In ublk_queue_rq(), Assume current request is a re-issued request aborted
previously in monitor_work because the ubq_daemon(ioucmd's task) is
PF_EXITING. For this request, we cannot call
io_uring_cmd_complete_in_task() anymore because at that moment io_uring
context may be freed in case that no inflight ioucmd exists. Otherwise,
we may cause null-deref in ctx->fallback_work.
Add a check on UBLK_IO_FLAG_ABORTED to prevent the above situation. This
check is safe and makes sense.
Note: monitor_work sets UBLK_IO_FLAG_ABORTED and ends this request
(releasing the tag). Then the request is restarted(allocating the tag)
and we are here. Since releasing/allocating a tag implies smp_mb(),
finding UBLK_IO_FLAG_ABORTED guarantees that here is a re-issued request
aborted previously.
ZiyangZhang [Mon, 15 Aug 2022 02:36:32 +0000 (10:36 +0800)]
ublk_drv: update comment for __ublk_fail_req()
Since __ublk_rq_task_work always fails requests immediately during
exiting, __ublk_fail_req() is only called from abort context during
exiting. So lock is unnecessary.
virtio_pci: Revert "virtio_pci: support the arg sizes of find_vqs()"
This reverts commit cdb44806fca2d0ad29ca644cbf1505433902ee0c: the legacy
path is wrong and in fact can not support the proposed API since for a
legacy device we never communicate the vq size to the hypervisor.
This has been reported to trip up guests on GCP (Google Cloud).
The reason is that virtio_find_vqs_ctx_size is broken on legacy
devices. We can in theory fix virtio_find_vqs_ctx_size but
in fact the patch itself has several other issues:
- It treats unknown speed as < 10G
- It leaves userspace no way to find out the ring size set by hypervisor
- It tests speed when link is down
- It ignores the virtio spec advice:
Both \field{speed} and \field{duplex} can change, thus the driver
is expected to re-read these values after receiving a
configuration change notification.
- It is not clear the performance impact has been tested properly
We may account memory to a memcg of a request that didn't even got to
the network layer. It's not a bug as it'll be routinely cleaned up on
flush, but it might be confusing for the userspace.
Pavel Begunkov [Mon, 15 Aug 2022 12:42:00 +0000 (13:42 +0100)]
io_uring/net: use right helpers for async recycle
We have a helper that checks for whether a request contains anything in
->async_data or not, namely req_has_async_data(). It's better to use it
as it might have some extra considerations.
Jakub Kicinski [Tue, 16 Aug 2022 03:14:39 +0000 (20:14 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-08-12 (iavf)
This series contains updates to iavf driver only.
Przemyslaw frees memory for admin queues in initialization error paths,
prevents freeing of vf_res which is causing null pointer dereference,
and adjusts calls in error path of reset to avoid iavf_close() which
could cause deadlock.
Ivan Vecera avoids deadlock that can occur when driver if part of
failover.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
iavf: Fix deadlock in initialization
iavf: Fix reset error handling
iavf: Fix NULL pointer dereference in iavf_get_link_ksettings
iavf: Fix adminq error handling
====================
Zhengchao Shao [Mon, 15 Aug 2022 02:46:29 +0000 (10:46 +0800)]
net: rtnetlink: fix module reference count leak issue in rtnetlink_rcv_msg
When bulk delete command is received in the rtnetlink_rcv_msg function,
if bulk delete is not supported, module_put is not called to release
the reference counting. As a result, module reference count is leaked.
Namjae Jeon [Sun, 14 Aug 2022 13:40:25 +0000 (22:40 +0900)]
ksmbd: don't remove dos attribute xattr on O_TRUNC open
When smb client open file in ksmbd share with O_TRUNC, dos attribute
xattr is removed as well as data in file. This cause the FSCTL_SET_SPARSE
request from the client fails because ksmbd can't update the dos attribute
after setting ATTR_SPARSE_FILE. And this patch fix xfstests generic/469
test also.
Hyunchul Lee [Fri, 12 Aug 2022 02:11:32 +0000 (11:11 +0900)]
ksmbd: remove unnecessary generic_fillattr in smb2_open
Remove unnecessary generic_fillattr to fix wrong
AllocationSize of SMB2_CREATE response, And
Move the call of ksmbd_vfs_getattr above the place
where stat is needed because of truncate.
This patch fixes wrong AllocationSize of SMB2_CREATE
response. Because ext4 updates inode->i_blocks only
when disk space is allocated, generic_fillattr does
not set stat.blocks properly for delayed allocation.
But ext4 returns the blocks that include the delayed
allocation blocks when getattr is called.
Sander Vanheule [Tue, 9 Aug 2022 17:36:35 +0000 (19:36 +0200)]
lib/cpumask: drop always-true preprocessor guard
Since lib/cpumask.o is only built for CONFIG_SMP=y, NR_CPUS will always
be greater than 1 at compile time. This makes checking for that
condition unnecesarry, so it can be dropped.
Sander Vanheule [Tue, 9 Aug 2022 17:36:34 +0000 (19:36 +0200)]
lib/cpumask: add inline cpumask_next_wrap() for UP
In the uniprocessor case, cpumask_next_wrap() can be simplified, as the
number of valid argument combinations is limited:
- 'start' can only be 0
- 'n' can only be -1 or 0
The only valid CPU that can then be returned, if any, will be the first
one set in the provided 'mask'.
For NR_CPUS == 1, include/linux/cpumask.h now provides an inline
definition of cpumask_next_wrap(), which will conflict with the one
provided by lib/cpumask.c. Make building of lib/cpumask.o again depend
on CONFIG_SMP=y (i.e. NR_CPUS > 1) to avoid the re-definition.
Sander Vanheule [Tue, 9 Aug 2022 17:36:33 +0000 (19:36 +0200)]
cpumask: align signatures of UP implementations
Between the generic version, and their uniprocessor optimised
implementations, the return types of cpumask_any_and_distribute() and
cpumask_any_distribute() are not identical. Change the UP versions to
'unsigned int', to match the generic versions.
Liming Sun [Tue, 9 Aug 2022 17:37:42 +0000 (13:37 -0400)]
mmc: sdhci-of-dwcmshc: Re-enable support for the BlueField-3 SoC
The commit 08f3dff799d4 (mmc: sdhci-of-dwcmshc: add rockchip platform
support") introduces the use of_device_get_match_data() to check for some
chips. Unfortunately, it also breaks the BlueField-3 FW, which uses ACPI.
To fix the problem, let's add the ACPI match data and the corresponding
quirks to re-enable the support for the BlueField-3 SoC.
selftests/landlock: fix broken include of linux/landlock.h
Revert part of the earlier changes to fix the kselftest build when
using a sub-directory from the top of the tree as this broke the
landlock test build as a side-effect when building with "make -C
tools/testing/selftests/landlock".
netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified
Since f3a2181e16f1 ("netfilter: nf_tables: Support for sets with
multiple ranged fields"), it possible to combine intervals and
concatenations. Later on, ef516e8625dd ("netfilter: nf_tables:
reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag
for userspace to report that the set stores a concatenation.
Make sure NFT_SET_CONCAT is set on if field_count is specified for
consistency. Otherwise, if NFT_SET_CONCAT is specified with no
field_count, bail out with EINVAL.
Fixes: ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") Signed-off-by: Pablo Neira Ayuso <[email protected]>
Al Viro [Mon, 8 Aug 2022 15:09:45 +0000 (16:09 +0100)]
nios2: add force_successful_syscall_return()
If we use the ancient SysV syscall ABI, we'd better have tell the
kernel how to claim that a negative return value is a success.
Use ->orig_r2 for that - it's inaccessible via ptrace, so it's
a fair game for changes and it's normally[*] non-negative on return
from syscall. Set to -1; syscall is not going to be restart-worthy
by definition, so we won't interfere with that use either.
[*] the only exception is rt_sigreturn(), where we skip the entire
messing with r1/r2 anyway.
netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags
If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the
netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise,
NFTA_SET_ELEM_KEY_END should not be present.
For catch-all element, NFTA_SET_ELEM_KEY_END should not be present.
The NFT_SET_ELEM_INTERVAL_END is never used with this set flags
combination.
s390/ap: fix crash on older machines based on QCI info missing
On older z series machines (z12 and older) there is no QCI info
available. The AP code took care of this and the AP bus scan then
switched to simple probing via TAPQ.
With commit 283915850a44 ("s390/ap: notify drivers on config changed and scan complete callbacks")
some code was introduced which silently assumed that the QCI info is
always available. However, with KVM simulating an older machine (z12)
the result was a kernel crash. Funnily the same crash does not happen
on LPAR - maybe because NULL is a valid pointer and reading some data
from address 0 also works fine.
This fix now improves the code to be aware that the QCI instruction
may not be available on older machines and thus the two pointers to
QCI info structs may simple be NULL.
However, on a machine not providing the QCI info the two callbacks to
the zcrypt device drivers on_config_changed() and on_scan_complete()
provide parameters which are pointers to a QCI info struct.
These both callbacks are NOT served if there is no QCI info available.
The only consumer of these callbacks is the vfio device driver. This
driver only supports CEX4 and higher. All physical machines which are
able to provide CEX4 cards have QCI support available. So there is
no sense in for example fill the QCI info struct by hand with looping
over cards and queues and TAPQ each APQN.
Signed-off-by: Harald Freudenberger <[email protected]> Signed-off-by: Tony Krowiak <[email protected]> Cc: [email protected] Fixes: 283915850a44 ("s390/ap: notify drivers on config changed and scan complete callbacks") Signed-off-by: Alexander Gordeev <[email protected]>
Wenbin Mei [Thu, 28 Jul 2022 08:00:48 +0000 (16:00 +0800)]
mmc: mtk-sd: Clear interrupts when cqe off/disable
Currently we don't clear MSDC interrupts when cqe off/disable, which led
to the data complete interrupt will be reserved for the next command.
If the next command with data transfer after cqe off/disable, we process
the CMD ready interrupt and trigger DMA start for data, but the data
complete interrupt is already exists, then SW assume that the data transfer
is complete, SW will trigger DMA stop, but the data may not be transmitted
yet or is transmitting, so we may encounter the following error:
mtk-msdc 11230000.mmc: CMD bus busy detected.
David S. Miller [Mon, 15 Aug 2022 10:49:58 +0000 (11:49 +0100)]
Merge branch 'mlxsw-fixes'
Petr Machata says:
====================
mlxsw: Fixes for PTP support
This set fixes several issues in mlxsw PTP code.
- Patch #1 fixes compilation warnings.
- Patch #2 adjusts the order of operation during cleanup, thereby
closing the window after PTP state was already cleaned in the ASIC
for the given port, but before the port is removed, when the user
could still in theory make changes to the configuration.
- Patch #3 protects the PTP configuration with a custom mutex, instead
of relying on RTNL, which is not held in all access paths.
- Patch #4 forbids enablement of PTP only in RX or only in TX. The
driver implicitly assumed this would be the case, but neglected to
sanitize the configuration.
====================
Amit Cohen [Fri, 12 Aug 2022 15:32:03 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Forbid PTP enablement only in RX or in TX
Currently mlxsw driver configures one global PTP configuration for all
ports. The reason is that the switch behaves like a transparent clock
between CPU port and front-panel ports. When time stamp is enabled in
any port, the hardware is configured to update the correction field. The
fact that the configuration of CPU port affects all the ports, makes the
correction field update to be global for all ports. Otherwise, user will
see odd values in the correction field, as the switch will update the
correction field in the CPU port, but not in all the front-panel ports.
The CPU port is relevant in both RX and TX, so to avoid problematic
configuration, forbid PTP enablement only in one direction, i.e., only in
RX or TX.
Without the change:
$ hwstamp_ctl -i swp1 -r 12 -t 0
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 0
rx_filter 2
$ echo $?
0
With the change:
$ hwstamp_ctl -i swp1 -r 12 -t 0
current settings:
tx_type 1
rx_filter 2
SIOCSHWTSTAMP failed: Invalid argument
Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Amit Cohen [Fri, 12 Aug 2022 15:32:02 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Protect PTP configuration with a mutex
Currently the functions mlxsw_sp2_ptp_{configure, deconfigure}_port()
assume that they are called when RTNL is locked and they warn otherwise.
The deconfigure function can be called when port is removed, for example
as part of device reload, then there is no locked RTNL and the function
warns [1].
To avoid such case, do not assume that RTNL protects this code, add a
dedicated mutex instead. The mutex protects 'ptp_state->config' which
stores the existing global configuration in hardware. Use this mutex also
to protect the code which configures the hardware. Then, there will be
only one configuration in any time, which will be updated in 'ptp_state'
and a race will be avoided.
Amit Cohen [Fri, 12 Aug 2022 15:32:01 +0000 (17:32 +0200)]
mlxsw: spectrum: Clear PTP configuration after unregistering the netdevice
Currently as part of removing port, PTP API is called to clear the
existing configuration and set the 'rx_filter' and 'tx_type' to zero.
The clearing is done before unregistering the netdevice, which means that
there is a window of time in which the user can reconfigure PTP in the
port, and this configuration will not be cleared.
Reorder the operations, clear PTP configuration after unregistering the
netdevice.
Fixes: 8748642751ede ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Amit Cohen [Fri, 12 Aug 2022 15:32:00 +0000 (17:32 +0200)]
mlxsw: spectrum_ptp: Fix compilation warnings
In case that 'CONFIG_PTP_1588_CLOCK' is not enabled in the config file,
there are implementations for the functions
mlxsw_{sp,sp2}_ptp_txhdr_construct() as part of 'spectrum_ptp.h'. In this
case, they should be defined as 'static' as they are not supposed to be
used out of this file. Make the functions 'static', otherwise the following
warnings are returned:
"warning: no previous prototype for 'mlxsw_sp_ptp_txhdr_construct'"
"warning: no previous prototype for 'mlxsw_sp2_ptp_txhdr_construct'"
In addition, make the functions 'inline' for case that 'spectrum_ptp.h'
will be included anywhere else and the functions would probably not be
used, so compilation warnings about unused static will be returned.
handle of 0 implies from/to of universe realm which is not very
sensible.
Lets see what this patch will do:
$sudo tc qdisc add dev $DEV root handle 1:0 prio
//lets manufacture a way to insert handle of 0
$sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \
route to 0 from 0 classid 1:10 action ok
//gets rejected...
Error: handle of 0 is not valid.
We have an error talking to the kernel, -1
//lets create a legit entry..
sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \
classid 1:10 action ok
//what did the kernel insert?
$sudo tc filter ls dev $DEV parent 1:0
filter protocol ip pref 100 route chain 0
filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
//Lets try to replace that legit entry with a handle of 0
$ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \
handle 0x000a8000 route to 0 from 0 classid 1:10 action drop
Error: Replacing with handle of 0 is invalid.
We have an error talking to the kernel, -1
And last, lets run Cascardo's POC:
$ ./poc
0
0
-22
-22
-22
Xin Xiong [Sat, 13 Aug 2022 12:49:08 +0000 (20:49 +0800)]
net: fix potential refcount leak in ndisc_router_discovery()
The issue happens on specific paths in the function. After both the
object `rt` and `neigh` are grabbed successfully, when `lifetime` is
nonzero but the metric needs change, the function just deletes the
route and set `rt` to NULL. Then, it may try grabbing `rt` and `neigh`
again if above conditions hold. The function simply overwrite `neigh`
if succeeds or returns if fails, without decreasing the reference
count of previous `neigh`. This may result in memory leaks.
Fix it by decrementing the reference count of `neigh` in place.
Fixes: 6b2e04bc240f ("net: allow user to set metric on default route learned via Router Advertisement") Signed-off-by: Xin Xiong <[email protected]> Signed-off-by: Xin Tan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Right now we have a neigh_param PROXY_QLEN which specifies maximum length
of neigh_table->proxy_queue. But in fact, this limitation doesn't work well
because check condition looks like:
tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN)
The problem is that p (struct neigh_parms) is a per-device thing,
but tbl (struct neigh_table) is a system-wide global thing.
It seems reasonable to make proxy_queue limit per-device based.
v2:
- nothing changed in this patch
v3:
- rebase to net tree
Denis V. Lunev [Thu, 11 Aug 2022 15:20:11 +0000 (18:20 +0300)]
neigh: fix possible DoS due to net iface start/stop loop
Normal processing of ARP request (usually this is Ethernet broadcast
packet) coming to the host is looking like the following:
* the packet comes to arp_process() call and is passed through routing
procedure
* the request is put into the queue using pneigh_enqueue() if
corresponding ARP record is not local (common case for container
records on the host)
* the request is processed by timer (within 80 jiffies by default) and
ARP reply is sent from the same arp_process() using
NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
pneigh_enqueue())
And here the problem comes. Linux kernel calls pneigh_queue_purge()
which destroys the whole queue of ARP requests on ANY network interface
start/stop event through __neigh_ifdown().
This is actually not a problem within the original world as network
interface start/stop was accessible to the host 'root' only, which
could do more destructive things. But the world is changed and there
are Linux containers available. Here container 'root' has an access
to this API and could be considered as untrusted user in the hosting
(container's) world.
Thus there is an attack vector to other containers on node when
container's root will endlessly start/stop interfaces. We have observed
similar situation on a real production node when docker container was
doing such activity and thus other containers on the node become not
accessible.
The patch proposed doing very simple thing. It drops only packets from
the same namespace in the pneigh_queue_purge() where network interface
state change is detected. This is enough to prevent the problem for the
whole node preserving original semantics of the code.
v2:
- do del_timer_sync() if queue is empty after pneigh_queue_purge()
v3:
- rebase to net tree
Maxim Kochetkov [Thu, 11 Aug 2022 09:48:40 +0000 (12:48 +0300)]
net: qrtr: start MHI channel after endpoit creation
MHI channel may generates event/interrupt right after enabling.
It may leads to 2 race conditions issues.
1)
Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check:
if (!qdev || mhi_res->transaction_status)
return;
Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at
this moment. In this situation qrtr-ns will be unable to enumerate
services in device.
---------------------------------------------------------------
2)
Such event may come at the moment after dev_set_drvdata() and
before qrtr_endpoint_register(). In this case kernel will panic with
accessing wrong pointer at qcom_mhi_qrtr_dl_callback():
Because endpoint is not created yet.
--------------------------------------------------------------
So move mhi_prepare_for_transfer_autoqueue after endpoint creation
to fix it.
Samuel Holland [Fri, 12 Aug 2022 03:16:23 +0000 (22:16 -0500)]
drm/sun4i: dsi: Prevent underflow when computing packet sizes
Currently, the packet overhead is subtracted using unsigned arithmetic.
With a short sync pulse, this could underflow and wrap around to near
the maximal u16 value. Fix this by using signed subtraction. The call to
max() will correctly handle any negative numbers that are produced.
Apply the same fix to the other timings, even though those subtractions
are less likely to underflow.
Samuel Holland [Fri, 12 Aug 2022 07:37:02 +0000 (02:37 -0500)]
dt-bindings: display: sun4i: Add D1 TCONs to conditionals
When adding the D1 TCON bindings, I missed the conditional blocks that
restrict the binding for TCON LCD vs TCON TV hardware. Add the D1 TCON
variants to the appropriate blocks for DE2 TCON LCDs and TCON TVs.
This is because pcibios_alloc_controller() holds hose_spinlock but
of_alias_get_id() takes of_mutex which can sleep.
The hose_spinlock protects the phb_bitmap, and also the hose_list, but
it doesn't need to be held while get_phb_number() calls the OF routines,
because those are only looking up information in the device tree.
So fix it by having get_phb_number() take the hose_spinlock itself, only
where required, and then dropping the lock before returning.
pcibios_alloc_controller() then needs to take the lock again before the
list_add() but that's safe, the order of the list is not important.
Yury Norov [Fri, 12 Aug 2022 05:34:25 +0000 (22:34 -0700)]
radix-tree: replace gfp.h inclusion with gfp_types.h
Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
have gfp_types.h for this.
Fixes powerpc allmodconfig build:
In file included from include/linux/nodemask.h:97,
from include/linux/mmzone.h:17,
from include/linux/gfp.h:7,
from include/linux/radix-tree.h:12,
from include/linux/idr.h:15,
from include/linux/kernfs.h:12,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/pci.h:35,
from arch/powerpc/kernel/prom_init.c:24:
include/linux/random.h: In function 'add_latent_entropy':
>> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
25 | add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
| ^~~~~~~~~~~~~~
| add_latent_entropy
include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in
Linus Torvalds [Sun, 14 Aug 2022 20:03:53 +0000 (13:03 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lseek fix from Al Viro:
"Fix proc_reg_llseek() breakage. Always had been possible if somebody
left NULL ->proc_lseek, became a practical issue now"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
take care to handle NULL ->proc_lseek()
Linus Torvalds [Sun, 14 Aug 2022 16:22:11 +0000 (09:22 -0700)]
Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tool updates from Arnaldo Carvalho de Melo:
- 'perf c2c' now supports ARM64, adjust its output to cope with
differences with what is in x86_64. Now go find false sharing on
ARM64 (at least Neoverse) as well!
- Refactor the JSON processing, making the output more compact and thus
reducing the size of the resulting perf binary
- Improvements for 'perf offcpu' profiling, including tracking child
processes
- Update Intel JSON metrics and events files for broadwellde,
broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown,
knightslanding, sapphirerapids, skylakex and snowridgex
- Add 'perf stat' JSON output and a 'perf test' entry for it
- Ignore memfd and anonymous mmap events if jitdump present
- Fix an error handling path in 'parse_perf_probe_command()'
- Fixes for the guest Intel PT tracing patchkit in the 1st batch of
this merge window
- Print debuginfod queries if -v option is used, to explain delays in
processing when debuginfo servers are enabled to fetch DSOs with
richer symbol tables
- Improve error message for 'perf record -p not_existing_pid'
- Fix openssl and libbpf feature detection
- Add PMU pai_crypto event description for IBM z16 on 'perf list'
- Fix typos and duplicated words on comments in various places
* tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits)
perf test: Refactor shell tests allowing subdirs
perf vendor events: Update events for snowridgex
perf vendor events: Update events and metrics for skylakex
perf vendor events: Update metrics for sapphirerapids
perf vendor events: Update events for knightslanding
perf vendor events: Update metrics for jaketown
perf vendor events: Update metrics for ivytown
perf vendor events: Update events and metrics for icelakex
perf vendor events: Update events and metrics for haswellx
perf vendor events: Update events and metrics for cascadelakex
perf vendor events: Update events and metrics for broadwellx
perf vendor events: Update metrics for broadwellde
perf jevents: Fold strings optimization
perf jevents: Compress the pmu_events_table
perf metrics: Copy entire pmu_event in find metric
perf pmu-events: Hide the pmu_events
perf pmu-events: Don't assume pmu_event is an array
perf pmu-events: Move test events/metrics to JSON
perf test: Use full metric resolution
perf pmu-events: Hide pmu_events_map
...
Linus Torvalds [Sun, 14 Aug 2022 15:48:13 +0000 (08:48 -0700)]
Merge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
CPUs trap on it rather than ignoring it as they should.
- Fix ftrace when building with clang, which was broken by some
refactoring.
- A couple of other minor fixes.
Thanks to Christophe Leroy, Naveen N. Rao, Nick Desaulniers, Ondrej
Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.
* tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kexec: Fix build failure from uninitialised variable
powerpc/ppc-opcode: Fix PPC_RAW_TW()
powerpc64/ftrace: Fix ftrace for clang builds
powerpc: Make eh value more explicit when using lwarx
powerpc: Don't hide eh field of lwarx behind a macro
powerpc: Fix eh field when calling lwarx on PPC32
Nadav Amit [Sat, 13 Aug 2022 22:59:43 +0000 (15:59 -0700)]
x86/kprobes: Fix JNG/JNLE emulation
When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong
condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is
performed if (ZF == 1 or SF != OF). However the kernel emulation
currently uses 'and' instead of 'or'.
As a result, setting a kprobe on JNG/JNLE might cause the kernel to
behave incorrectly whenever the kprobe is hit.
Linus Torvalds [Sun, 14 Aug 2022 00:31:18 +0000 (17:31 -0700)]
Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- two fixes for stable, one for a lock length miscalculation, and
another fixes a lease break timeout bug
- improvement to handle leases, allows the close timeout to be
configured more safely
- five restructuring/cleanup patches
* tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Do not access tcon->cfids->cfid directly from is_path_accessible
cifs: Add constructor/destructors for tcon->cfid
SMB3: fix lease break timeout when multiple deferred close handles for the same file.
smb3: allow deferred close timeout to be configurable
cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir
cifs: Move cached-dir functions into a separate file
cifs: Remove {cifs,nfs}_fscache_release_page()
cifs: fix lock length calculation
David Howells [Wed, 10 Aug 2022 17:52:47 +0000 (18:52 +0100)]
afs: Enable multipage folio support
Enable multipage folio support for the afs filesystem.
Support has already been implemented in netfslib, fscache and cachefiles
and in most of afs, but I've waited for Matthew Wilcox's latest folio
changes.
Note that it does require a change to afs_write_begin() to return the
correct subpage. This is a "temporary" change as we're working on
getting rid of the need for ->write_begin() and ->write_end()
completely, at least as far as network filesystems are concerned - but
it doesn't prevent afs from making use of the capability.
Linus Torvalds [Sat, 13 Aug 2022 21:38:22 +0000 (14:38 -0700)]
Merge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Misc timer fixes:
- fix a potential use-after-free bug in posix timers
- correct a prototype
- address a build warning"
* tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Cleanup CPU timers before freeing them during exec
time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
posix-timers: Make do_clock_gettime() static
Linus Torvalds [Sat, 13 Aug 2022 21:24:12 +0000 (14:24 -0700)]
Merge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not
turned on by default), which also need STIBP enabled (if available) to
be '100% safe' on even the shortest speculation windows"
* tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Enable STIBP for IBPB mitigated RETBleed
Linus Torvalds [Sat, 13 Aug 2022 21:00:45 +0000 (14:00 -0700)]
Merge tag 'ntb-5.20' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Non-Transparent Bridge updates.
Fix of heap data and clang warnings, support for a new Intel NTB
device, and NTB EndPoint Function (EPF) support and the various fixes
for that"
* tag 'ntb-5.20' of https://github.com/jonmason/ntb:
MAINTAINERS: add PCI Endpoint NTB drivers to NTB files
NTB: EPF: Tidy up some bounds checks
NTB: EPF: Fix error code in epf_ntb_bind()
PCI: endpoint: pci-epf-vntb: reduce several globals to statics
PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()
PCI: endpoint: Fix Kconfig dependency
NTB: EPF: set pointer addr to null using NULL rather than 0
Documentation: PCI: extend subheading underline for "lspci output" section
Documentation: PCI: Use code-block block for scratchpad registers diagram
Documentation: PCI: Add specification for the PCI vNTB function device
PCI: endpoint: Support NTB transfer between RC and EP
NTB: epf: Allow more flexibility in the memory BAR map method
PCI: designware-ep: Allow pci_epc_set_bar() update inbound map address
ntb: intel: add GNR support for Intel PCIe gen5 NTB
NTB: ntb_tool: uninitialized heap data in tool_fn_write()
ntb: idt: fix clang -Wformat warnings
Linus Torvalds [Sat, 13 Aug 2022 20:50:11 +0000 (13:50 -0700)]
Merge tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull more xfs updates from Darrick Wong:
"There's not a lot this time around, just the usual bug fixes and
corrections for missing error returns.
- Return error codes from block device flushes to userspace
- Fix a deadlock between reclaim and mount time quotacheck
- Fix an unnecessary ENOSPC return when doing COW on a filesystem
with severe free space fragmentation
- Fix a miscalculation in the transaction reservation computations
for file removal operations"
* tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix inode reservation space for removing transaction
xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork
xfs: fix intermittent hang during quotacheck
xfs: check return codes when flushing block devices
Linus Torvalds [Sat, 13 Aug 2022 20:41:48 +0000 (13:41 -0700)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Mostly small bug fixes and trivial updates.
The major new core update is a change to the way device, target and
host reference counting is done to try to make it more robust (this
change has soaked for a while to try to winkle out any bugs)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: pm8001: Fix typo 'the the' in comment
scsi: megaraid_sas: Remove redundant variable cmd_type
scsi: FlashPoint: Remove redundant variable bm_int_st
scsi: zfcp: Fix missing auto port scan and thus missing target ports
scsi: core: Call blk_mq_free_tag_set() earlier
scsi: core: Simplify LLD module reference counting
scsi: core: Make sure that hosts outlive targets
scsi: core: Make sure that targets outlive devices
scsi: ufs: ufs-pci: Correct check for RESET DSM
scsi: target: core: De-RCU of se_lun and se_lun acl
scsi: target: core: Fix race during ACL removal
scsi: ufs: core: Correct ufshcd_shutdown() flow
scsi: ufs: core: Increase the maximum data buffer size
scsi: lpfc: Check the return value of alloc_workqueue()
Linus Torvalds [Sat, 13 Aug 2022 20:37:36 +0000 (13:37 -0700)]
Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request
- print nvme connect Linux error codes properly (Amit Engel)
- fix the fc_appid_store return value (Christoph Hellwig)
- fix a typo in an error message (Christophe JAILLET)
- add another non-unique identifier quirk (Dennis P. Kliem)
- check if the queue is allocated before stopping it in nvme-tcp
(Maurizio Lombardi)
- restart admin queue if the caller needs to restart queue in
nvme-fc (Ming Lei)
- use kmemdup instead of kmalloc + memcpy in nvme-auth (Zhang
Xiaoxu)
- __alloc_disk_node() error handling fix (Rafael)
* tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block:
block: Do not call blk_put_queue() if gendisk allocation fails
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70
nvme-tcp: check if the queue is allocated before stopping it
nvme-fabrics: Fix a typo in an error message
nvme-fabrics: parse nvme connect Linux error codes
nvmet-auth: use kmemdup instead of kmalloc + memcpy
nvme-fc: fix the fc_appid_store return value
nvme-fc: restart admin queue if the caller needs to restart queue
Linus Torvalds [Sat, 13 Aug 2022 20:28:54 +0000 (13:28 -0700)]
Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Regression fix for this merge window, fixing a wrong order of
arguments for io_req_set_res() for passthru (Dylan)
- Fix for the audit code leaking context memory (Peilin)
- Ensure that provided buffers are memcg accounted (Pavel)
- Correctly handle short zero-copy sends (Pavel)
- Sparse warning fixes for the recvmsg multishot command (Dylan)
- Error handling fix for passthru (Anuj)
- Remove randomization of struct kiocb fields, to avoid it growing in
size if re-arranged in such a fashion that it grows more holes or
padding (Keith, Linus)
- Small series improving type safety of the sqe fields (Stefan)
* tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block:
io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields
io_uring: make io_kiocb_to_cmd() typesafe
fs: don't randomize struct kiocb fields
io_uring: consistently make use of io_notif_to_data()
io_uring: fix error handling for io_uring_cmd
io_uring: fix io_recvmsg_prep_multishot sparse warnings
io_uring/net: send retry for zerocopy
io_uring: mem-account pbuf buckets
audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()
io_uring: pass correct parameters to io_req_set_res
Carsten Haitzler [Fri, 12 Aug 2022 12:16:28 +0000 (13:16 +0100)]
perf test: Refactor shell tests allowing subdirs
This is a prelude to adding more tests to shell tests and in order to
support putting those tests into subdirectories, I need to change the
test code that scans/finds and runs them.
To support subdirs I have to recurse so it's time to refactor the code
to allow this and centralize the shell script finding into one location
and only one single scan that builds a list of all the found tests in
memory instead of it being duplicated in 3 places.
This code also optimizes things like knowing the max width of desciption
strings (as we can do that while we scan instead of a whole new pass of
opening files).
It also more cleanly filters scripts to see only *.sh files thus
skipping random other files in directories like *~ backup files, other
random junk/data files that may appear and the scripts must be
executable to make the cut (this ensures the script lib dir is not seen
as scripts to run).
This avoids perf test running previous older versions of test scripts
that are editor backup files as well as skipping perf.data files that
may appear and so on.
Ian Rogers [Fri, 12 Aug 2022 23:09:49 +0000 (16:09 -0700)]
perf jevents: Fold strings optimization
If a shorter string ends a longer string then the shorter string may
reuse the longer string at an offset. For example, on x86 the event
arith.cycles_div_busy and cycles_div_busy can be folded, even though
they have difference names the strings are identical after 6
characters. cycles_div_busy can reuse the arith.cycles_div_busy string
at an offset of 6.
In pmu-events.c this looks like the following where the 'also:' lists
folded strings:
/* offset=177541 */ "arith.cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000" /* also: cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000 */
As jevents.py combines multiple strings for an event into a larger
string, the amount of folding is minimal as all parts of the event must
align. Other organizations can benefit more from folding, but lose space
by say recording more offsets.
Ian Rogers [Fri, 12 Aug 2022 23:09:48 +0000 (16:09 -0700)]
perf jevents: Compress the pmu_events_table
The pmu_events array requires 15 pointers per entry which in position
independent code need relocating. Change the array to be an array of
offsets within a big C string. Only the offset of the first variable is
required, subsequent variables are stored in order after the \0
terminator (requiring a byte per variable rather than 4 bytes per
offset).
The file size savings are:
no jevents - the same 19,788,464bytes
x86 jevents - ~16.7% file size saving 23,744,288bytes vs 28,502,632bytes
all jevents - ~19.5% file size saving 24,469,056bytes vs 30,379,920bytes
default build options plus NO_LIBBFD=1.
For example, the x86 build savings come from .rela.dyn and
.data.rel.ro becoming smaller by 3,157,032bytes and 3,042,016bytes
respectively. .rodata increases by 1,432,448bytes, giving an overall
4,766,600bytes saving.
To make metric strings more shareable, the topic is changed from say
'skx metrics' to just 'metrics'.
To try to help with the memory layout the pmu_events are ordered as used
by perf qsort comparator functions.
Ian Rogers [Fri, 12 Aug 2022 23:09:47 +0000 (16:09 -0700)]
perf metrics: Copy entire pmu_event in find metric
The pmu_event passed to the pmu_events_table_for_each_event is invalid
after the loop. Copy the entire struct in metricgroup__find_metric.
Reduce the scope of this function to static.
Ian Rogers [Fri, 12 Aug 2022 23:09:43 +0000 (16:09 -0700)]
perf test: Use full metric resolution
The simple metric resolution doesn't handle recursion properly, switch
to use the full resolution as with the parse-metric tests which also
increases coverage. Don't set the values for the metric backward as
failures to generate a result are ignored.
Ian Rogers [Fri, 12 Aug 2022 23:09:41 +0000 (16:09 -0700)]
perf pmu-events: Avoid passing pmu_events_map
Preparation for hiding pmu_events_map as an implementation detail. While
the map is passed, the table of events is all that is normally wanted.
While modifying the function's types, rename pmu_events_map__find to
pmu_events_table__find to match later encapsulation. Similarly rename
pmu_add_cpu_aliases_map to pmu_add_cpu_aliases_table.
Ian Rogers [Fri, 12 Aug 2022 23:09:39 +0000 (16:09 -0700)]
perf jevents: Sort JSON files entries
Sort the JSON files entries on conversion to C. The sort order tries to
replicated cmp_sevent from pmu.c so that the input there is already
sorted except for sysfs events. Specifically, the sort order is given by
the tuple:
(not j.desc is None, fix_none(j.topic), fix_none(j.name), fix_none(j.pmu), fix_none(j.metric_name))
which is putting events with descriptions and topics before those
without, then sorting by name, then pmu and finally metric_name
Add the topic to JsonEvent on reading to simplify. Remove an unnecessary
lambda in the JSON reading.