Dave Ertman [Tue, 10 Oct 2023 17:32:15 +0000 (10:32 -0700)]
ice: Fix SRIOV LAG disable on non-compliant aggregate
If an attribute of an aggregate interface disqualifies it from supporting
SRIOV, the driver will unwind the SRIOV support. Currently the driver is
clearing the feature bit for all interfaces in the aggregate, but this is
not allowing the other interfaces to unwind successfully on driver unload.
Only clear the feature bit for the interface that is currently unwinding.
Ivan Vecera [Tue, 24 Oct 2023 12:51:08 +0000 (14:51 +0200)]
i40e: Do not call devlink_port_type_clear()
Do not call devlink_port_type_clear() prior devlink port unregister
and let devlink core to take care about it.
Reproducer:
[root@host ~]# rmmod i40e
[ 4539.964699] i40e 0000:02:00.0: devlink port type for port 0 cleared without a software interface reference, device type not supported by the kernel?
[ 4540.319811] i40e 0000:02:00.1: devlink port type for port 1 cleared without a software interface reference, device type not supported by the kernel?
Fixes: 9e479d64dc58 ("i40e: Add initial devlink support") Signed-off-by: Ivan Vecera <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
Linus Torvalds [Mon, 6 Nov 2023 23:06:06 +0000 (15:06 -0800)]
Merge tag 'media/v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- the old V4L2 core videobuf kAPI was finally removed. All media
drivers should now be using VB2 kAPI
- new automotive driver: mgb4
- new platform video driver: npcm-video
- new sensor driver: mt9m114
- new TI driver used in conjunction with Cadence CSI2RX IP to bridge
TI-specific parts
- ir-rx51 was removed and the N900 DT binding was moved to the
pwm-ir-tx generic driver
- drop atomisp-specific ov5693, using the upstream driver instead
- the camss driver has gained RDI3 support for VFE 17x
- the atomisp driver now detects ISP2400 or ISP2401 at run time. No
need to set it up at build time anymore
- lots of driver fixes, cleanups and improvements
* tag 'media/v6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (377 commits)
media: nuvoton: VIDEO_NPCM_VCD_ECE should depend on ARCH_NPCM
media: venus: Fix firmware path for resources
media: venus: hfi_cmds: Replace one-element array with flex-array member and use __counted_by
media: venus: hfi_parser: Add check to keep the number of codecs within range
media: venus: hfi: add checks to handle capabilities from firmware
media: venus: hfi: fix the check to handle session buffer requirement
media: venus: hfi: add checks to perform sanity on queue pointers
media: platform: cadence: select MIPI_DPHY dependency
media: MAINTAINERS: Fix path for J721E CSI2RX bindings
media: cec: meson: always include meson sub-directory in Makefile
media: videobuf2: Fix IS_ERR checking in vb2_dc_put_userptr()
media: platform: mtk-mdp3: fix uninitialized variable in mdp_path_config()
media: mediatek: vcodec: using encoder device to alloc/free encoder memory
media: imx-jpeg: notify source chagne event when the first picture parsed
media: cx231xx: Use EP5_BUF_SIZE macro
media: siano: Drop unnecessary error check for debugfs_create_dir/file()
media: mediatek: vcodec: Handle invalid encoder vsi
media: aspeed: Drop unnecessary error check for debugfs_create_file()
Documentation: media: buffer.rst: fix V4L2_BUF_FLAG_PREPARED
Documentation: media: gen-errors.rst: fix confusing ENOTTY description
...
Dylan Yudaken [Mon, 6 Nov 2023 20:39:09 +0000 (20:39 +0000)]
io_uring: do not clamp read length for multishot read
When doing a multishot read, the code path reuses the old read
paths. However this breaks an assumption built into those paths,
namely that struct io_rw::len is available for reuse by __io_import_iovec.
For multishot this results in len being set for the first receive
call, and then subsequent calls are clamped to that buffer length
incorrectly.
Instead keep len as zero after recycling buffers, to reuse the full
buffer size of the next selected buffer.
Dylan Yudaken [Mon, 6 Nov 2023 20:39:08 +0000 (20:39 +0000)]
io_uring: do not allow multishot read to set addr or len
For addr: this field is not used, since buffer select is forced.
But by forcing it to be zero it leaves open future uses of the field.
len is actually usable, you could imagine that you want to receive
multishot up to a certain length.
However right now this is not how it is implemented, and it seems
safer to force this to be zero.
Clément Léger [Tue, 24 Oct 2023 13:26:52 +0000 (15:26 +0200)]
riscv: Use SYM_*() assembly macros instead of deprecated ones
ENTRY()/END()/WEAK() macros are deprecated and we should make use of the
new SYM_*() macros [1] for better annotation of symbols. Replace the
deprecated ones with the new ones and fix wrong usage of END()/ENDPROC()
to correctly describe the symbols.
Clément Léger [Tue, 24 Oct 2023 13:26:51 +0000 (15:26 +0200)]
riscv: use ".L" local labels in assembly when applicable
For the sake of coherency, use local labels in assembly when
applicable. This also avoid kprobes being confused when applying a
kprobe since the size of function is computed by checking where the
next visible symbol is located. This might end up in computing some
function size to be way shorter than expected and thus failing to apply
kprobes to the specified offset.
[ERROR] This is an ELF file and cannot be programmed to flash directly: arch/riscv/boot/loader.bin
Before, loader.bin relied on "OBJCOPYFLAGS := -O binary" in the main
RISC-V Makefile to create a boot image with the right format. With this
removed, the image is now created in the wrong (ELF) format.
Hannes Reinecke [Tue, 24 Oct 2023 06:13:37 +0000 (08:13 +0200)]
nvme: start keep-alive after admin queue setup
Setting up I/O queues might take quite some time on larger and/or
busy setups, so KATO might expire before all I/O queues could be
set up.
Fix this by start keep alive from the ->init_ctrl_finish() callback,
and stopping it when calling nvme_cancel_admin_tagset().
Hannes Reinecke [Tue, 24 Oct 2023 06:13:36 +0000 (08:13 +0200)]
nvme-loop: always quiesce and cancel commands before destroying admin q
Once ->init_ctrl_finish() is called there may be commands outstanding,
so we should quiesce the admin queue and cancel all commands prior
to call nvme_loop_destroy_admin_queue().
Mark O'Donovan [Wed, 25 Oct 2023 10:51:25 +0000 (10:51 +0000)]
nvme-auth: always set valid seq_num in dhchap reply
Currently a seqnum of zero is sent during uni-directional
authentication. The zero value is reserved for the secure channel
feature which is not yet implemented.
Relevant extract from the spec:
The value 0h is used to indicate that bidirectional authentication
is not performed, but a challenge value C2 is carried in order to
generate a pre-shared key (PSK) for subsequent establishment of a
secure channel
Mark O'Donovan [Wed, 25 Oct 2023 10:51:24 +0000 (10:51 +0000)]
nvme-auth: add flag for bi-directional auth
Introduces an explicit variable for bi-directional auth.
The currently used variable chap->s2 is incorrectly zeroed for
uni-directional auth. That will be fixed in the next patch so this
needs to change to avoid sending unexpected success2 messages
Daniel Wagner [Mon, 30 Oct 2023 16:00:44 +0000 (17:00 +0100)]
nvme: update firmware version after commit
The firmware version sysfs entry needs to be updated after a successfully
firmware activation.
nvme-cli stopped issuing an Identify Controller command to list the
current firmware information and relies on sysfs showing the current
firmware version.
All error handling path end to the error handling path, except this one.
Go to the error handling branch as well here, otherwise 'icreq' and
'icresp' will leak.
Fixes: 2837966ab2a8 ("nvme-tcp: control message handling for recvmsg()") Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
Eric Biggers [Sun, 29 Oct 2023 05:00:40 +0000 (22:00 -0700)]
nvme-auth: use crypto_shash_tfm_digest()
Simplify nvme_auth_augmented_challenge() by using
crypto_shash_tfm_digest() instead of an alloc+init+update+final
sequence. This should also improve performance.
This series optimizes the tlb flushes on riscv which used to simply
flush the whole tlb whatever the size of the range to flush or the size
of the stride.
Patch 3 introduces a threshold that is microarchitecture specific and
will very likely be modified by vendors, not sure though which mechanism
we'll use to do that (dt? alternatives? vendor initialization code?).
* b4-shazam-merge:
riscv: Improve flush_tlb_kernel_range()
riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
riscv: Improve flush_tlb_range() for hugetlb pages
riscv: Improve tlb_flush()
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:28 +0000 (14:30 +0100)]
riscv: Improve flush_tlb_kernel_range()
This function used to simply flush the whole tlb of all harts, be more
subtile and try to only flush the range.
The problem is that we can only use PAGE_SIZE as stride since we don't know
the size of the underlying mapping and then this function will be improved
only if the size of the region to flush is < threshold * PAGE_SIZE.
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:27 +0000 (14:30 +0100)]
riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Currently, when the range to flush covers more than one page (a 4K page or
a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
tlb comes with a greater cost than flushing a single entry so we should
flush single entries up to a certain threshold so that:
threshold * cost of flushing a single entry < cost of flushing the whole
tlb.
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:26 +0000 (14:30 +0100)]
riscv: Improve flush_tlb_range() for hugetlb pages
flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form,
when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the
whole tlb: so set a stride of the size of the hugetlb mapping in order to
only flush the hugetlb mapping. However, if the hugepage is a NAPOT region,
all PTEs that constitute this mapping must be invalidated, so the stride
size must actually be the size of the PTE.
Note that THPs are directly handled by flush_pmd_tlb_range().
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:25 +0000 (14:30 +0100)]
riscv: Improve tlb_flush()
For now, tlb_flush() simply calls flush_tlb_mm() which results in a
flush of the whole TLB. So let's use mmu_gather fields to provide a more
fine-grained flush of the TLB.
Nirmoy Das [Thu, 26 Oct 2023 12:56:36 +0000 (14:56 +0200)]
drm/i915/tc: Fix -Wformat-truncation in intel_tc_port_init
Fix below compiler warning:
intel_tc.c:1879:11: error: ‘%d’ directive output may be truncated
writing between 1 and 11 bytes into a region of size 3
[-Werror=format-truncation=]
"%c/TC#%d", port_name(port), tc_port + 1);
^~
intel_tc.c:1878:2: note: ‘snprintf’ output between 7 and 17 bytes
into a destination of size 8
snprintf(tc->port_name, sizeof(tc->port_name),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%c/TC#%d", port_name(port), tc_port + 1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
v2: use kasprintf(Imre)
v3: use const for port_name, and fix tc mem leak(Imre)
Ville Syrjälä [Tue, 31 Oct 2023 16:08:00 +0000 (18:08 +0200)]
drm/i915: Bump GLK CDCLK frequency when driving multiple pipes
On GLK CDCLK frequency needs to be at least 2*96 MHz when accessing
the audio hardware. Currently we bump the CDCLK frequency up
temporarily (if not high enough already) whenever audio hardware
is being accessed, and drop it back down afterwards.
With a single active pipe this works just fine as we can switch
between all the valid CDCLK frequencies by changing the cd2x
divider, which doesn't require a full modeset. However with
multiple active pipes the cd2x divider trick no longer works,
and thus we end up blinking all displays off and back on.
To avoid this let's just bump the CDCLK frequency to >=2*96MHz
whenever multiple pipes are active. The downside is slightly
higher power consumption, but that seems like an acceptable
tradeoff. With a single active pipe we can stick to the current
more optiomal (from power comsumption POV) behaviour.
Jerome Brunet [Mon, 6 Nov 2023 10:40:11 +0000 (11:40 +0100)]
ASoC: hdmi-codec: register hpd callback on component probe
The HDMI hotplug callback to the hdmi-codec is currently registered when
jack is set.
The hotplug not only serves to report the ASoC jack state but also to get
the ELD. It should be registered when the component probes instead, so it
does not depend on the card driver registering a jack for the HDMI to
properly report the ELD.
D. Wythe [Fri, 3 Nov 2023 06:07:40 +0000 (14:07 +0800)]
net/smc: put sk reference if close work was canceled
Note that we always hold a reference to sock when attempting
to submit close_work. Therefore, if we have successfully
canceled close_work from pending, we MUST release that reference
to avoid potential leaks.
Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups") Signed-off-by: D. Wythe <[email protected]> Reviewed-by: Dust Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
D. Wythe [Fri, 3 Nov 2023 06:07:39 +0000 (14:07 +0800)]
net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
This patch re-fix the issues mentioned by commit 22a825c541d7
("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
Blocking sending message do solve the issues though, but it also
prevents the peer to receive the final message. Besides, in logic,
whether the sndbuf_desc is NULL or not have no impact on the processing
of cdc message sending.
Hence that, this patch allows the cdc message sending but to check the
sndbuf_desc with care in smc_cdc_tx_handler().
Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()") Signed-off-by: D. Wythe <[email protected]> Reviewed-by: Dust Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
if the DEAD flag lost, the state SMC_CLOSED will be never be reached
in smc_close_passive_work:
if (sock_flag(sk, SOCK_DEAD) &&
smc_close_sent_any_close(conn)) {
sk->sk_state = SMC_CLOSED;
} else {
/* just shutdown, but not yet closed locally */
sk->sk_state = SMC_APPFINCLOSEWAIT;
}
Replace sock_set_flags or __set_bit to set_bit will fix this problem.
Since set_bit is atomic.
Jakub Kicinski [Thu, 2 Nov 2023 18:52:27 +0000 (11:52 -0700)]
nfsd: regenerate user space parsers after ynl-gen changes
Commit 8cea95b0bd79 ("tools: ynl-gen: handle do ops with no input attrs")
added support for some of the previously-skipped ops in nfsd.
Regenerate the user space parsers to fill them in.
This looks confusing as (1) we pack SACKPERM into the leading
2-bytes of the aligned 12-bytes of TS and (2) TCPOLEN_MSS_ALIGNED
is not used. Fortunately, the calculated limit is not wrong as
TCPOLEN_SACKPERM_ALIGNED and TCPOLEN_MSS_ALIGNED are the same value.
However, we should use the proper constant in the formula.
Geetha sowjanya [Tue, 31 Oct 2023 11:23:45 +0000 (16:53 +0530)]
octeontx2-pf: Free pending and dropped SQEs
On interface down, the pending SQEs in the NIX get dropped
or drained out during SMQ flush. But skb's pointed by these
SQEs never get free or updated to the stack as respective CQE
never get added.
This patch fixes the issue by freeing all valid skb's in SQ SG list.
Fixes: b1bc8457e9d0 ("octeontx2-pf: Cleanup all receive buffers in SG descriptor") Signed-off-by: Geetha sowjanya <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jamal Hadi Salim [Sat, 28 Oct 2023 17:16:10 +0000 (13:16 -0400)]
net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config
Getting the following splat [1] with CONFIG_DEBUG_NET=y and this
reproducer [2]. Problem seems to be that classifiers clear 'struct
tcf_result::drop_reason', thereby triggering the warning in
__kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0).
Fixed by disambiguating a legit error from a verdict with a bogus drop_reason
ip link add name veth0 type veth peer name veth1
ip link set dev veth0 up
ip link set dev veth1 up
tc qdisc add dev veth1 clsact
tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
Ido reported:
[...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
reproducer [2]. Problem seems to be that classifiers clear 'struct
tcf_result::drop_reason', thereby triggering the warning in
__kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
ip link add name veth0 type veth peer name veth1
ip link set dev veth0 up
ip link set dev veth1 up
tc qdisc add dev veth1 clsact
tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
What happens is that inside most classifiers the tcf_result is copied over
from a filter template e.g. *res = f->res which then implicitly overrides
the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
set via sch_handle_{ingress,egress}() for kfree_skb_reason().
Commit text above copied verbatim from Daniel. The general idea of the patch
is not very different from what Ido originally posted but instead done at the
cls_api codepath.
Zongmin Zhou [Tue, 1 Aug 2023 02:53:09 +0000 (10:53 +0800)]
drm/qxl: prevent memory leak
The allocated memory for qdev->dumb_heads should be released
in qxl_destroy_monitors_object before qxl suspend.
otherwise,qxl_create_monitors_object will be called to
reallocate memory for qdev->dumb_heads after qxl resume,
it will cause memory leak.
Jia He [Sat, 28 Oct 2023 10:20:59 +0000 (10:20 +0000)]
dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM
There is an unusual case that the range map covers right up to the top
of system RAM, but leaves a hole somewhere lower down. Then it prevents
the nvme device dma mapping in the checking path of phys_to_dma() and
causes the hangs at boot.
E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
Method (_DMA, 0, Serialized) // _DMA: Direct Memory Access
{
Name (RBUF, ResourceTemplate ()
{
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000000000000000, // Range Minimum
0x00000000FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000100000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000006010200000, // Range Minimum
0x000000602FFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x000000001FE00000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x00000060F0000000, // Range Minimum
0x00000060FFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000000010000000, // Length
,, , AddressRangeMemory, TypeStatic)
QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
0x0000000000000000, // Granularity
0x0000007000000000, // Range Minimum
0x000003FFFFFFFFFF, // Range Maximum
0x0000000000000000, // Translation Offset
0x0000039000000000, // Length
,, , AddressRangeMemory, TypeStatic)
})
Jia He [Sat, 28 Oct 2023 10:20:58 +0000 (10:20 +0000)]
dma-mapping: move dma_addressing_limited() out of line
This patch moves dma_addressing_limited() out of line, serving as a
preliminary step to prevent the introduction of a new publicly accessible
low-level helper when validating whether all system RAM is mapped within
the DMA mapping range.
powerpc: Remove file parameter from phys_mem_access_prot()
Remove 'file' parameter from struct machdep_calls.phys_mem_access_prot
and its implementation in pci_phys_mem_access_prot(). The file is not
used on PowerPC. By removing it, a later patch can simplify fbdev's
mmap code, which uses phys_mem_access_prot() on PowerPC.
Linus Torvalds [Mon, 6 Nov 2023 02:49:40 +0000 (18:49 -0800)]
Merge tag 'rtc-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There is a new driver for the RTC of the Mstar SSD202D SoC. The
rtc7301 driver gains support for byte addresses to support the
USRobotics USR8200. Then we have many non user visible changes and
typo fixes.
Summary:
Subsytem:
- convert platform drivers to remove_new
- prevent modpost warnings for unremovable platform drivers
New driver:
- Mstar SSD202D
Drivers:
- brcmstb-waketimer: support level alarm_irq
- ep93xx: add DT support
- rtc7301: support byte-addressed IO"
* tag 'rtc-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (28 commits)
dt-bindings: rtc: Add Mstar SSD202D RTC
rtc: Add support for the SSD202D RTC
rtc: at91rm9200: annotate at91_rtc_remove with __exit again
dt-bindings: rtc: microcrystal,rv3032: Document wakeup-source property
dt-bindings: rtc: pcf8523: Convert to YAML
dt-bindings: rtc: mcp795: move to trivial-rtc
rtc: ep93xx: add DT support for Cirrus EP93xx
dt-bindings: rtc: Add Cirrus EP93xx
dt-bindings: rtc: pcf2123: convert to YAML
rtc: efi: fixed typo in efi_procfs()
rtc: omap: Use device_get_match_data()
rtc: pcf85363: fix wrong mask/val parameters in regmap_update_bits call
rtc: rtc7301: Support byte-addressed IO
rtc: rtc7301: Rewrite bindings in schema
rtc: sh: Convert to platform remove callback returning void
rtc: pxa: Convert to platform remove callback returning void
rtc: mv: Convert to platform remove callback returning void
rtc: imxdi: Convert to platform remove callback returning void
rtc: at91rm9200: Convert to platform remove callback returning void
rtc: pcap: Drop no-op remove function
...
Linus Torvalds [Mon, 6 Nov 2023 02:45:32 +0000 (18:45 -0800)]
Merge tag 'mailbox-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox updates from Jassi Brar:
- imx: add support for TX Doorbell v2
- mtk: implement runtime PM
- zynqmp: add destination mailbox compatible
- qcom:
- add another clock provider for IPQ
- add SM8650 compatible
- misc: use preferred device_get_match_data()
* tag 'mailbox-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox:
dt-bindings: mailbox: qcom-ipcc: document the SM8650 Inter-Processor Communication Controller
mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend
mailbox: Use device_get_match_data()
dt-bindings: zynqmp: add destination mailbox compatible
dt-bindings: mailbox: qcom: add one more clock provider for IPQ mailbox
mailbox: imx: support channel type tx doorbell v2
dt-bindings: mailbox: fsl,mu: add new tx doorbell channel
gfs2: don't withdraw if init_threads() got interrupted
In gfs2_fill_super(), when mounting a gfs2 filesystem is interrupted,
kthread_create() can return -EINTR. When that happens, we roll back
what has already been done and abort the mount.
Since commit 62dd0f98a0e5 ("gfs2: Flag a withdraw if init_threads()
fails), we are calling gfs2_withdraw_delayed() in gfs2_fill_super();
first via gfs2_make_fs_rw(), then directly. But gfs2_withdraw_delayed()
only marks the filesystem as withdrawing and relies on a caller further
up the stack to do the actual withdraw, which doesn't exist in the
gfs2_fill_super() case. Because the filesystem is marked as withdrawing
/ withdrawn, function gfs2_lm_unmount() doesn't release the dlm
lockspace, so when we try to mount that filesystem again, we get:
Since commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"), the
deadlock this gfs2_withdraw_delayed() call was supposed to work around
cannot occur anymore because freeze_go_callback() won't take the
sb->s_umount semaphore unconditionally anymore, so we can get rid of the
gfs2_withdraw_delayed() in gfs2_fill_super() entirely.
Su Hui [Thu, 2 Nov 2023 01:51:42 +0000 (09:51 +0800)]
gfs2: remove dead code in add_to_queue
clang static analyzer complains that value stored to 'gh' is never read.
The code of this line is useless after commit 0b93bac2271e
("gfs2: Remove LM_FLAG_PRIORITY flag"). Remove this code to save space.
Juntong Deng [Sun, 29 Oct 2023 21:10:06 +0000 (05:10 +0800)]
gfs2: Fix slab-use-after-free in gfs2_qd_dealloc
In gfs2_put_super(), whether withdrawn or not, the quota should
be cleaned up by gfs2_quota_cleanup().
Otherwise, struct gfs2_sbd will be freed before gfs2_qd_dealloc (rcu
callback) has run for all gfs2_quota_data objects, resulting in
use-after-free.
Also, gfs2_destroy_threads() and gfs2_quota_cleanup() is already called
by gfs2_make_fs_ro(), so in gfs2_put_super(), after calling
gfs2_make_fs_ro(), there is no need to call them again.
gfs2: Silence "suspicious RCU usage in gfs2_permission" warning
Commit 0abd1557e21c added rcu_dereference() for dereferencing ip->i_gl
in gfs2_permission. This now causes lockdep to complain when
gfs2_permission is called in non-RCU context:
WARNING: suspicious RCU usage in gfs2_permission
Switch to rcu_dereference_check() and check for the MAY_NOT_BLOCK flag
to shut up lockdep when we know that dereferencing ip->i_gl is safe.
Fixes: 0abd1557e21c ("gfs2: fix an oops in gfs2_permission") Reported-by: [email protected] Signed-off-by: Andreas Gruenbacher <[email protected]>
Amir Goldstein [Tue, 24 Oct 2023 07:55:35 +0000 (10:55 +0300)]
gfs2: fs: derive f_fsid from s_uuid
gfs2 already has optional persistent uuid.
Use that uuid to report f_fsid in statfs(2), same as ext2/ext4/zonefs.
This allows gfs2 to be monitored by fanotify filesystem watch.
for example, with inotify-tools 4.23.8.0, the following command can be
used to watch changes over entire filesystem:
gfs2: No longer use 'extern' in function declarations
For non-static function declarations, external linkage is implied and
the 'extern' keyword isn't needed. Some static checkers complain about
the overuse of 'extern', so clean up all the function declarations.
In addition, remove 'extern' from the definition of
free_local_statfs_inodes(); it isn't needed there, either.
Function gfs2_lookup_simple() is used for looking up inodes in the
metadata directory tree, so rename it to gfs2_lookup_meta() to closer
match its purpose. Clean the function up a little on the way.
gfs2: Minor gfs2_write_jdata_batch PAGE_SIZE cleanup
In gfs2_write_jdata_batch(), to compute the number of blocks, compute
the total size of the folio batch instead of the number of pages it
contains. Not a functional change.
Note that we don't currently allow mounting filesystems with a block
size bigger than the page size. We could change that after converting
the page cache to folios. The page cache would then only contain
block-size or bigger folios, so rounding wouldn't become an issue here.
Update T-Head memory type definitions according to C910 doc [1]
For NC and IO, SH property isn't configurable, hardcoded as SH,
so set SH for NOCACHE and IO.
And also set bit[61](Bufferable) for NOCACHE according to the
table 6.1 in the doc [1].
This series renews one of my last year RFC patch[1], tries to improve
the vdso layout a bit.
patch1 removes useless symbols
patch2 merges .data section of vdso into .rodata because they are
readonly
patch3 is the real renew patch, it removes hardcoded 0x800 .text start
addr. But I rewrite the commit msg per Andrew's suggestions and move
move .note, .eh_frame_hdr, and .eh_frame between .rodata and .text to
keep the actual code well away from the non-instruction data.
* b4-shazam-merge:
riscv: vdso.lds.S: remove hardcoded 0x800 .text start addr
riscv: vdso.lds.S: merge .data section into .rodata section
riscv: vdso.lds.S: drop __alt_start and __alt_end symbols
I believe the hardcoded 0x800 and related comments come from the long
history VDSO_TEXT_OFFSET in x86 vdso code, but commit 5b9304933730
("x86 vDSO: generate vdso-syms.lds") and commit f6b46ebf904f ("x86
vDSO: new layout") removes the comment and hard coding for x86.
Similar as x86 and other arch, riscv doesn't need the rigid layout
using VDSO_TEXT_OFFSET since it "no longer matters to the kernel".
so we could remove the hard coding now, and removing it brings a
small vdso.so and aligns with other architectures.
Also, having enough separation between data and text is important for
I-cache, so similar as x86, move .note, .eh_frame_hdr, and .eh_frame
between .rodata and .text.
Nam Cao [Tue, 29 Aug 2023 08:36:15 +0000 (10:36 +0200)]
riscv: provide riscv-specific is_trap_insn()
uprobes expects is_trap_insn() to return true for any trap instructions,
not just the one used for installing uprobe. The current default
implementation only returns true for 16-bit c.ebreak if C extension is
enabled. This can confuse uprobes if a 32-bit ebreak generates a trap
exception from userspace: uprobes asks is_trap_insn() who says there is no
trap, so uprobes assume a probe was there before but has been removed, and
return to the trap instruction. This causes an infinite loop of entering
and exiting trap handler.
Instead of using the default implementation, implement this function
speficially for riscv with checks for both ebreak and c.ebreak.
s390/mm: use compound page order to distinguish page tables
CRSTs always have size of four pages, while 2KB-size page tables
always occupy a single page. Use that information to distinguish
page tables from CRSTs.
Cease using 4KB pages to host two 2KB PTEs. That greatly
simplifies the memory management code at the expense of
page tables memory footprint.
Instead of two PTEs per 4KB page use only upper half of
the parent page for a single PTE. With that the list of
half-used pages pgtable_list becomes unneeded.
Further, the upper byte of the parent page _refcount
counter does not need to be used for fragments tracking
and could be left alone.
Commit 8211dad62798 ("s390: add pte_free_defer() for
pgtables sharing page") introduced the use of PageActive
flag to coordinate a deferred free with 2KB page table
fragments tracking. Since there is no tracking anymore,
there is no need for using PageActive flag.
Heiko Carstens [Fri, 27 Oct 2023 12:12:39 +0000 (14:12 +0200)]
s390/cmma: rework no-dat handling
Rework the way physical pages are set no-dat / dat:
The old way is:
- Rely on that all pages are initially marked "dat"
- Allocate page tables for the kernel mapping
- Enable dat
- Walk the whole kernel mapping and set PG_arch_1 bit in all struct pages
that belong to pages of kernel page tables
- Walk all struct pages and test and clear the PG_arch_1 bit. If the bit is
not set, set the page state to no-dat
- For all subsequent page table allocations, set the page state to dat
(remove the no-dat state) on allocation time
Change this rather complex logic to a simpler approach:
- Set the whole physical memory (all pages) to "no-dat"
- Explicitly set those page table pages to "dat" which are part of the
kernel image (e.g. swapper_pg_dir)
- For all subsequent page table allocations, set the page state to dat
(remove the no-dat state) on allocation time
In result the code is simpler, and this also allows to get rid of one
odd usage of the PG_arch_1 bit.
Heiko Carstens [Fri, 27 Oct 2023 12:12:38 +0000 (14:12 +0200)]
s390/cmma: move arch_set_page_dat() to header file
In order to be usable for early boot code move the simple
arch_set_page_dat() function to header file, and add its counter-part
arch_set_page_nodat(). Also change the parameters, and the function name
slightly.
This is required since there aren't any struct pages available in early
boot code, and renaming of functions is done to make sure that all users
are converted to the new API.
Instead of a pointer to a struct page a virtual address is passed, and
instead of an order the number of pages for which the page state needs be
set.
Heiko Carstens [Fri, 27 Oct 2023 12:12:37 +0000 (14:12 +0200)]
s390/cmma: move set_page_stable() and friends to header file
In order to be usable for early boot code move the simple set_page_xxx()
function to header file. Also change the parameters, and the function names
slightly.
This is required since there aren't any struct pages available in early
boot code, and renaming of functions is done to make sure that all users
are converted to the new API.
Instead of a pointer to a struct page a virtual address is passed, and
instead of an order the number of pages for which the page state needs be
set.
Heiko Carstens [Fri, 27 Oct 2023 12:12:36 +0000 (14:12 +0200)]
s390/cmma: move parsing of cmma kernel parameter to early boot code
The "cmma=" kernel command line parameter needs to be parsed early for
upcoming changes. Therefore move the parsing code.
Note that EX_TABLE handling of cmma_test_essa() needs to be open-coded,
since the early boot code doesn't have infrastructure for handling expected
exceptions.
Heiko Carstens [Fri, 27 Oct 2023 12:12:35 +0000 (14:12 +0200)]
s390/cmma: cleanup inline assemblies
Cleanup cmma related inline assemblies:
- consolidate inline assemblies
- use symbolic names
- add same white space where missing
- add braces to for-loops which contain a multi-line statement
s390/ap: fix vanishing crypto cards in SE environment
A secure execution (SE, also known as confidential computing)
guest may see asynchronous errors on a crypto firmware queue.
The current implementation to gather information about cards
and queues in ap_queue_info() simple returns if an asynchronous
error is hanging on the firmware queue. If such a situation
happened and it was the only queue visible for a crypto card
within an SE guest, then the card vanished from sysfs as
the AP bus scan function refuses to hold a card without any
type information. As lszcrypt evaluates the sysfs such
a card vanished from the lszcrypt card listing and the
user is baffled and has no way to reset and thus clear the
pending asynchronous error.
This patch improves the named function to also evaluate GR2
of the TAPQ in case of asynchronous error pending. If there
is a not-null value stored in, the info is processed now.
In the end, a queue with pending asynchronous error does not
lead to a vanishing card any more.
Ingo Franzki [Thu, 26 Oct 2023 09:24:03 +0000 (11:24 +0200)]
s390/zcrypt: don't report online if card or queue is in check-stop state
If a card or a queue is in check-stop state, it can't be used by
applications to perform crypto operations. Applications check the 'online'
sysfs attribute to check if a card or queue is usable.
Report a card or queue as offline in case it is in check-stop state.
Furthermore, don't allow to set a card or queue online, if it is in
check-stop state.
This is similar to when the card or the queue is in deconfigured state,
there it is also reported as being offline, and it can't be set online.
Heiko Carstens [Mon, 30 Oct 2023 15:50:47 +0000 (16:50 +0100)]
s390: add USER_STACKTRACE support
Use the perf_callchain_user() code as blueprint to also add support for
USER_STACKTRACE. To describe how to use this cite the commit message of the
LoongArch implementation which came with commit 4d7bf939df08 ("LoongArch:
Add USER_STACKTRACE support"), but replace -fno-omit-frame-pointer option
with the s390 specific -mbackchain option:
====================================================================== To
get the best stacktrace output, you can compile your userspace programs
with frame pointers (at least glibc + the app you are tracing).
1, export "CC = gcc -mbackchain";
2, compile your programs with "CC";
3, use uprobe to get stacktrace output.
Heiko Carstens [Mon, 30 Oct 2023 15:50:46 +0000 (16:50 +0100)]
s390/perf: implement perf_callchain_user()
Daan De Meyer and Neal Gompa reported that s390 does not support perf user
stack unwinding.
This was never implemented since this requires user space to be compiled
with the -mbackchain compile option, which until now no distribution
did. However this is going to change with Fedora. Therefore provide a
perf_callchain_user() implementation.
Note that due to the way s390 sets up stack frames the provided call chains
can contain invalid values. This is especially true for the first stack
frame, where it is not possible to tell if the return address has been
written to the stack already or not.
s390/ap: fix AP bus crash on early config change callback invocation
Fix kernel crash in AP bus code caused by very early invocation of the
config change callback function via SCLP.
After a fresh IML of the machine the crypto cards are still offline and
will get switched online only with activation of any LPAR which has the
card in it's configuration. A crypto card coming online is reported
to the LPAR via SCLP and the AP bus offers a callback function to get
this kind of information. However, it may happen that the callback is
invoked before the AP bus init function is complete. As the callback
triggers a synchronous AP bus scan, the scan may already run but some
internal states are not initialized by the AP bus init function resulting
in a crash like this:
This patch improves the ap_bus_force_rescan() function which is
invoked by the config change callback by checking if a first
initial AP bus scan has been done. If not, the force rescan request
is simple ignored. Anyhow it does not make sense to trigger AP bus
re-scans even before the very first bus scan is complete.
This patch introduces some code lines which check
for interrupt support enabled on an AP queue after
a reply has been received. This invocation has been
chosen as there is a good chance to have the queue
empty at that time. As the enablement of the irq
imples a state machine change the queue should not
have any pending requests or unreceived replies.
s390/ap: rework to use irq info from ap queue status
This patch reworks the irq handling and reporting code
for the AP queue interrupt handling to always use the
irq info from the queue status.
Until now the interrupt status of an AP queue was stored
into a bool variable within the ap_queue struct. This
variable was set on a successful interrupt enablement
and cleared with kicking a reset. However, it may be
that the interrupt state is manipulated outband for
example by a hypervisor. This patch removes this variable
and instead the irq bit from the AP queue status which is
always reflecting the current irq state is used.
Commit 6326c26c1514 ("s390: convert various pgalloc functions
to use ptdescs") missed to convert tlb_remove_table() into
tlb_remove_ptdesc() in few locations.
Linus Torvalds [Sun, 5 Nov 2023 19:02:32 +0000 (09:02 -1000)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"vhost,virtio,vdpa: features, fixes, cleanups.
vdpa/mlx5:
- VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
- new maintainer
vdpa:
- support for vq descriptor mappings
- decouple reset of iotlb mapping from device reset
and fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
vdpa_sim: implement .reset_map support
vdpa/mlx5: implement .reset_map driver op
vhost-vdpa: clean iotlb map during reset for older userspace
vdpa: introduce .compat_reset operation callback
vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
vhost-vdpa: reset vendor specific mapping to initial state in .release
vdpa: introduce .reset_map operation callback
virtio_pci: add check for common cfg size
virtio-blk: fix implicit overflow on virtio_max_dma_size
virtio_pci: add build offset check for the new common cfg items
virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit
vduse: make vduse_class constant
vhost-scsi: Spelling s/preceeding/preceding/g
virtio: kdoc for struct virtio_pci_modern_device
vdpa: Update sysfs ABI documentation
MAINTAINERS: Add myself as mlx5_vdpa driver
virtio-balloon: correct the comment of virtballoon_migratepage()
mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
vdpa/mlx5: Update cvq iotlb mapping on ASID change
vdpa/mlx5: Make iotlb helper functions more generic
...
Linus Torvalds [Sun, 5 Nov 2023 18:50:41 +0000 (08:50 -1000)]
Merge tag 'firewire-updates-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire update from Takashi Sakamoto:
"A slight change for flexible length of array in core function.
Kees Cook provides a patch to annotate the array embedded in fw_node
structure referring to structure member for the length of array. The
annotation would be defined by future extension of C compilers, and
used for access bound-check at run-time enabled by UBSAN and
FORTIFY_SOURCE"
* tag 'firewire-updates-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: Annotate struct fw_node with __counted_by
Linus Torvalds [Sun, 5 Nov 2023 18:41:14 +0000 (08:41 -1000)]
Merge tag 'i2c-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"I2C has largely driver updates for 6.7, i.e. feature additions (like
adding transfers while in atomic mode), using new helpers (like
devm_clk_get_enabled), new IDs, documentation fixes and additions...
you name it.
The core got a memleak fix and better support for nested muxes"
* tag 'i2c-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (53 commits)
i2c: s3c2410: make i2c_s3c_irq_nextbyte() void
i2c: qcom-geni: add ACPI device id for sc8180x
Documentation: i2c: add fault code for not supporting 10 bit addresses
i2c: sun6i-p2wi: Prevent potential division by zero
i2c: mux: demux-pinctrl: Convert to use sysfs_emit_at() API
i2c: i801: Use new helper acpi_use_parent_companion
ACPI: Add helper acpi_use_parent_companion
MAINTAINERS: add YAML file for i2c-demux-pinctrl
i2c: core: fix lockdep warning for sparsely nested adapter chain
i2c: axxia: eliminate kernel-doc warnings
dt-bindings: i2c: i2c-demux-pinctrl: Convert to json-schema
i2c: stm32f7: Use devm_clk_get_enabled()
i2c: stm32f4: Use devm_clk_get_enabled()
i2c: stm32f7: add description of atomic in struct stm32f7_i2c_dev
i2c: fix memleak in i2c_new_client_device()
i2c: exynos5: Calculate t_scl_l, t_scl_h according to i2c spec
i2c: i801: Simplify class-based client device instantiation
i2c: exynos5: add support for atomic transfers
i2c: at91-core: Use devm_clk_get_enabled()
eeprom: at24: add ST M24C64-D Additional Write lockable page support
...
Linus Torvalds [Sun, 5 Nov 2023 18:28:32 +0000 (08:28 -1000)]
Merge tag 'ubifs-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBI and UBIFS updates from Richard Weinberger:
- UBI Fastmap improvements
- Minor issues found by static analysis bots in both UBI and UBIFS
- Fix for wrong dentry length UBIFS in fscrypt mode
* tag 'ubifs-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: ubifs_link: Fix wrong name len calculating when UBIFS is encrypted
ubi: block: Fix use-after-free in ubiblock_cleanup
ubifs: fix possible dereference after free
ubi: fastmap: Add control in 'UBI_IOCATT' ioctl to reserve PEBs for filling pools
ubi: fastmap: Add module parameter to control reserving filling pool PEBs
ubi: fastmap: Fix lapsed wear leveling for first 64 PEBs
ubi: fastmap: Get wl PEB even ec beyonds the 'max' if free PEBs are run out
ubi: fastmap: may_reserve_for_fm: Don't reserve PEB if fm_anchor exists
ubi: fastmap: Remove unneeded break condition while filling pools
ubi: fastmap: Wait until there are enough free PEBs before filling pools
ubi: fastmap: Use free pebs reserved for bad block handling
ubi: Replace erase_block() with sync_erase()
ubi: fastmap: Allocate memory with GFP_NOFS in ubi_update_fastmap
ubi: fastmap: erase_block: Get erase counter from wl_entry rather than flash
ubi: fastmap: Fix missed ec updating after erasing old fastmap data block
ubifs: Fix missing error code err
ubifs: Fix memory leak of bud->log_hash
ubifs: Fix some kernel-doc comments