Linus Torvalds [Fri, 17 Jun 2022 20:17:57 +0000 (15:17 -0500)]
Merge tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail
- Fix trunking detection & cl_max_connect setting
- Avoid pnfs_update_layout() livelocks
- Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE
* tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file
sunrpc: set cl_max_connect when cloning an rpc_clnt
pNFS: Avoid a live lock condition in pnfs_update_layout()
pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE
Linus Torvalds [Fri, 17 Jun 2022 20:12:20 +0000 (15:12 -0500)]
Merge tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert clipping of PCI host bridge windows to avoid E820 regions,
which broke several machines by forcing unnecessary BAR reassignments
(Hans de Goede)"
* tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
Linus Torvalds [Fri, 17 Jun 2022 19:57:42 +0000 (14:57 -0500)]
Merge tag 'printk-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
"Make the global console_sem available for CPU that is handling panic()
or shutdown.
This is an old problem when an existing console lock owner might block
console output, but it became more visible with the kthreads"
* tag 'printk-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Wait for the global console lock when the system is going down
printk: Block console kthreads when direct printing will be required
rethook: Reject getting a rethook if RCU is not watching
Since the rethook_recycle() will involve the call_rcu() for reclaiming
the rethook_instance, the rethook must be set up at the RCU available
context (non idle). This rethook_recycle() in the rethook trampoline
handler is inevitable, thus the RCU available check must be done before
setting the rethook trampoline.
This adds a rcu_is_watching() check in the rethook_try_get() so that
it will return NULL if it is called when !rcu_is_watching().
fprobe, samples: Add use_trace option and show hit/missed counter
Add use_trace option to use trace_printk() instead of pr_info()
so that the handler doesn't involve the RCU operations.
And show the hit and missed counter so that the user can check
how many times the probe handler hit and missed.
Daniel Borkmann [Fri, 17 Jun 2022 19:42:33 +0000 (21:42 +0200)]
bpf, docs: Update some of the JIT/maintenance entries
Various minor updates around some of the BPF-related entries:
JITs for ARM32/NFP/SPARC/X86-32 haven't seen updates in quite a while, thus
for now, mark them as 'Odd Fixes' until they become more actively developed.
JITs for POWERPC/S390 are in good shape and receive active development and
review, thus bump to 'Supported' similar as we have with X86-64/ARM64.
JITs for MIPS/RISC-V are in similar good shape as the ones mentioned above,
but looked after mostly in spare time, thus leave for now in 'Maintained' state.
Add Michael to PPC JIT given he's picking up the patches there, so it better
reflects today's state.
Also, I haven't done much reviewing around BPF sockmap/kTLS after John and I
did the big rework back in the days to integrate sockmap with kTLS.
These days, most of this is taken care by John, Jakub {Sitnicki,Kicinski} and
others in the community, so remove myself from these two.
Lastly, move all BPF-related entries into one place, that is, move the sockmap
one over near rest of BPF.
Prior to 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820
regions"), E820 regions did not affect PCI host bridge windows. We only
looked at E820 regions and avoided them when allocating new MMIO space.
If firmware PCI bridge window and BAR assignments used E820 regions, we
left them alone.
After 4c5e242d3e93, we removed E820 regions from the PCI host bridge
windows before looking at BARs, so firmware assignments in E820 regions
looked like errors, and we moved things around to fit in the space left
(if any) after removing the E820 regions. This unnecessary BAR
reassignment broke several machines.
Guilherme reported that Steam Deck fails to boot after 4c5e242d3e93. We
clipped the window that contained most 32-bit BARs:
BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved
acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff]
which forced us to reassign all those BARs, for example, this NVMe BAR:
pci 0000:00:01.2: PCI bridge to [bus 01]
pci 0000:00:01.2: bridge window [mem 0x80600000-0x806fffff]
pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit]
pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window
pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window
All the reassignments were successful, so the devices should have been
functional at the new addresses, but some were not.
Andy reported a similar failure on an Intel MID platform. Benjamin
reported a similar failure on a VMWare Fusion VM.
Note: this is not a clean revert; this revert keeps the later change to
make the clipping dependent on a new pci_use_e820 bool, moving the checking
of this bool to arch_remove_reservations().
Linus Torvalds [Fri, 17 Jun 2022 18:55:19 +0000 (13:55 -0500)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Revert the moving of the jump labels initialisation before
setup_machine_fdt(). The bug was fixed in drivers/char/random.c.
- Ftrace fixes: branch range check and consistent handling of PLTs.
- Clean rather than invalidate FROM_DEVICE buffers at start of DMA
transfer (safer if such buffer is mapped in user space). A cache
invalidation is done already at the end of the transfer.
- A couple of clean-ups (unexport symbol, remove unused label).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
arm64/cpufeature: Unexport set_cpu_feature()
arm64: ftrace: remove redundant label
arm64: ftrace: consistently handle PLTs.
arm64: ftrace: fix branch range checks
Revert "arm64: Initialize jump labels before setup_machine_fdt()"
Linus Torvalds [Fri, 17 Jun 2022 18:50:24 +0000 (13:50 -0500)]
Merge tag 'loongarch-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Add missing ELF_DETAILS in vmlinux.lds.S and fix document rendering"
* tag 'loongarch-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
docs/zh_CN/LoongArch: Fix notes rendering by using reST directives
docs/LoongArch: Fix notes rendering by using reST directives
LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS
Linus Torvalds [Fri, 17 Jun 2022 18:45:47 +0000 (13:45 -0500)]
Merge tag 'riscv-for-linus-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the PolarFire SOC's device tree
- A handful of fixes for the recently added Svpmbt support
- An improvement to the Kconfig text for Svpbmt
* tag 'riscv-for-linus-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
riscv: drop cpufeature_apply_feature tracking variable
riscv: fix dependency for t-head errata
riscv: dts: microchip: re-add pdma to mpfs device tree
* tag 'hyperv-fixes-signed-20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM
Drivers: hv: vmbus: Release cpu lock in error case
HID: hyperv: Correctly access fields declared as __le16
clocksource: hyper-v: unexport __init-annotated hv_init_clocksource()
Drivers: hv: Fix syntax errors in comments
Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
Linus Torvalds [Fri, 17 Jun 2022 18:22:58 +0000 (11:22 -0700)]
Merge tag 'block-5.19-2022-06-16' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph
- Quirks, quirks, quirks to work around buggy consumer grade
devices (Keith Bush, Ning Wang, Stefan Reiter, Rasheed Hsueh)
- Better kernel messages for devices that need quirking (Keith
Bush)
- Make a kernel message more useful (Thomas Weißschuh)
- MD pull request from Song, with a few fixes
- blk-mq sysfs locking fixes (Ming)
- BFQ stats fix (Bart)
- blk-mq offline queue fix (Bart)
- blk-mq flush request tag fix (Ming)
* tag 'block-5.19-2022-06-16' of git://git.kernel.dk/linux-block:
block/bfq: Enable I/O statistics
blk-mq: don't clear flush_rq from tags->rqs[]
blk-mq: avoid to touch q->elevator without any protection
blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
block: Fix handling of offline queues in blk_mq_alloc_request_hctx()
md/raid5-ppl: Fix argument order in bio_alloc_bioset()
Revert "md: don't unregister sync_thread with reconfig_mutex held"
nvme-pci: disable write zeros support on UMIC and Samsung SSDs
nvme-pci: avoid the deepest sleep state on ZHITAI TiPro7000 SSDs
nvme-pci: sk hynix p31 has bogus namespace ids
nvme-pci: smi has bogus namespace ids
nvme-pci: phison e12 has bogus namespace ids
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S50
nvme-pci: add trouble shooting steps for timeouts
nvme: add bug report info for global duplicate id
nvme: add device name to warning in uuid_show()
Linus Torvalds [Fri, 17 Jun 2022 18:14:07 +0000 (11:14 -0700)]
Merge tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Bigger than usual at this time, both because we missed -rc2, but also
because of some reverts that we chose to do. In detail:
- Adjust mapped buffer API while we still can (Dylan)
- Mapped buffer fixes (Dylan, Hao, Pavel, me)
- Fix for uring_cmd wrong API usage for task_work (Dylan)
- Fix for bug introduced in fixed file closing (Hao)
- Fix race in buffer/file resource handling (Pavel)
- Revert the NOP support for CQE32 and buffer selection that was
brought up during the merge window (Pavel)
- Remove IORING_CLOSE_FD_AND_FILE_SLOT introduced in this merge
window. The API needs further refining, so just yank it for now and
we'll revisit for a later kernel.
- Series cleaning up the CQE32 support added in this merge window,
making it more integrated rather than sitting on the side (Pavel)"
* tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block: (21 commits)
io_uring: recycle provided buffer if we punt to io-wq
io_uring: do not use prio task_work_add in uring_cmd
io_uring: commit non-pollable provided mapped buffers upfront
io_uring: make io_fill_cqe_aux honour CQE32
io_uring: remove __io_fill_cqe() helper
io_uring: fix ->extra{1,2} misuse
io_uring: fill extra big cqe fields from req
io_uring: unite fill_cqe and the 32B version
io_uring: get rid of __io_fill_cqe{32}_req()
io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT
Revert "io_uring: add buffer selection support to IORING_OP_NOP"
Revert "io_uring: support CQE32 for nop operation"
io_uring: limit size of provided buffer ring
io_uring: fix types in provided buffer ring
io_uring: fix index calculation
io_uring: fix double unlock for pbuf select
io_uring: kbuf: fix bug of not consuming ring buffer in partial io case
io_uring: openclose: fix bug of closing wrong fixed file
io_uring: fix not locked access to fixed buf table
io_uring: fix races with buffer table unregister
...
Will Deacon [Fri, 10 Jun 2022 15:12:27 +0000 (16:12 +0100)]
arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
Invalidating the buffer memory in arch_sync_dma_for_device() for
FROM_DEVICE transfers
When using the streaming DMA API to map a buffer prior to inbound
non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU
cachelines so that they will not be written back during the transfer and
corrupt the buffer contents written by the DMA. This, however, poses two
potential problems:
(1) If the DMA transfer does not write to every byte in the buffer,
then the unwritten bytes will contain stale data once the transfer
has completed.
(2) If the buffer has a virtual alias in userspace, then stale data
may be visible via this alias during the period between performing
the cache invalidation and the DMA writes landing in memory.
Address both of these issues by cleaning (aka writing-back) the dirty
lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding
them using invalidation.
Linus Torvalds [Fri, 17 Jun 2022 17:09:24 +0000 (10:09 -0700)]
Merge tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull writeback and ext2 fixes from Jan Kara:
"A fix for writeback bug which prevented machines with kdevtmpfs from
booting and also one small ext2 bugfix in IO error handling"
* tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
init: Initialize noop_backing_dev_info early
ext2: fix fs corruption when trying to remove a non-empty directory with IO error
Linus Torvalds [Fri, 17 Jun 2022 17:03:53 +0000 (10:03 -0700)]
Merge tag 'for-5.19/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix a race in DM core's dm_start_io_acct that could result in double
accounting for abnormal IO (e.g. discards, write zeroes, etc).
- Fix a use-after-free in DM core's dm_put_live_table_bio.
- Fix a race for REQ_NOWAIT bios being issued despite no support from
underlying DM targets (due to DM table reload at an "unlucky" time)
- Fix access beyond allocated bitmap in DM mirror's log.
* tag 'for-5.19/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm mirror log: round up region bitmap size to BITS_PER_LONG
dm: fix narrow race for REQ_NOWAIT bios being issued despite no support
dm: fix use-after-free in dm_put_live_table_bio
dm: fix race in dm_start_io_acct
Linus Torvalds [Fri, 17 Jun 2022 17:02:26 +0000 (10:02 -0700)]
Merge tag 'hwmon-for-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Add missing lock protection in occ driver
- Add missing comma in board name list in asus-ec-sensors driver
- Fix devicetree bindings for ti,tmp401
* tag 'hwmon-for-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (asus-ec-sensors) add missing comma in board name list.
hwmon: (occ) Lock mutex in shutdown to prevent race with occ_active
dt-bindings: hwmon: ti,tmp401: Drop 'items' from 'ti,n-factor' property
Linus Torvalds [Fri, 17 Jun 2022 14:58:39 +0000 (07:58 -0700)]
Merge tag 'char-misc-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.19-rc3 that resolve
some reported issues.
They include:
- mei driver fixes
- comedi driver fix
- rtsx build warning fix
- fsl-mc-bus driver fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
eeprom: at25: Split reads into chunks and cap write size
misc: atmel-ssc: Fix IRQ check in ssc_probe
char: lp: remove redundant initialization of err
Linus Torvalds [Fri, 17 Jun 2022 14:55:24 +0000 (07:55 -0700)]
Merge tag 'staging-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.19-rc3 that resolve
reported issues:
- remove visorbus.h which was forgotten in the -rc1 merge where the
code that used it was removed
- olpc_dcon: mark as broken to allow the DRM developers to evolve the
fbdev api properly without having to deal with this obsolete
driver. It will be removed soon if no one steps up to adopt it and
fix the issues with it.
- rtl8723bs driver fix
- r8188eu driver fix to resolve many reports of the driver being
broken with -rc1.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: Also remove the Unisys visorbus.h
staging: rtl8723bs: Allocate full pwep structure
staging: olpc_dcon: mark driver as broken
staging: r8188eu: Fix warning of array overflow in ioctl_linux.c
staging: r8188eu: fix rtw_alloc_hwxmits error detection for now
Linus Torvalds [Fri, 17 Jun 2022 14:52:43 +0000 (07:52 -0700)]
Merge tag 'tty-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.19-rc3 to
resolve some reported problems:
- 8250 lsr read bugfix
- n_gsm line discipline allocation fix
- qcom serial driver fix for reported lockups that happened in -rc1
- goldfish tty driver fix
All have been in linux-next for a while now with no reported issues"
* tag 'tty-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250: Store to lsr_save_flags after lsr read
tty: goldfish: Fix free_irq() on remove
tty: serial: qcom-geni-serial: Implement start_rx callback
serial: core: Introduce callback for start_rx and do stop_rx in suspend only if this callback implementation is present.
tty: n_gsm: Debug output allocation must use GFP_ATOMIC
Linus Torvalds [Fri, 17 Jun 2022 14:50:41 +0000 (07:50 -0700)]
Merge tag 'usb-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 5.19-rc3
They include:
- new usb-serial driver device ids
- usb gadget driver fixes for reported problems
- cdnsp driver fix
- dwc3 driver fixes for reported problems
- dwc3 driver fix for merge problem that I caused in 5.18
- xhci driver fixes
- dwc2 memory leak fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: f_fs: change ep->ep safe in ffs_epfile_io()
usb: gadget: f_fs: change ep->status safe in ffs_epfile_io()
xhci: Fix null pointer dereference in resume if xhci has only one roothub
USB: fixup for merge issue with "usb: dwc3: Don't switch OTG -> peripheral if extcon is present"
usb: cdnsp: Fixed setting last_trb incorrectly
usb: gadget: u_ether: fix regression in setting fixed MAC address
usb: gadget: lpc32xx_udc: Fix refcount leak in lpc32xx_udc_probe
usb: dwc2: Fix memory leak in dwc2_hcd_init
usb: dwc3: pci: Restore line lost in merge conflict resolution
usb: dwc3: gadget: Fix IN endpoint max packet size allocation
USB: serial: option: add support for Cinterion MV31 with new baseline
USB: serial: io_ti: add Agilent E5805A support
Youling Tang [Mon, 13 Jun 2022 10:54:12 +0000 (18:54 +0800)]
LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS
Commit c604abc3f6e ("vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG")
splits ELF_DETAILS from STABS_DEBUG, resulting in missing ELF_DETAILS
information in LoongArch architecture, so add it.
Jens Axboe [Fri, 17 Jun 2022 12:24:26 +0000 (06:24 -0600)]
io_uring: recycle provided buffer if we punt to io-wq
io_arm_poll_handler() will recycle the buffer appropriately if we end
up arming poll (or if we're ready to retry), but not for the io-wq case
if we have attempted poll first.
Explicitly recycle the buffer to avoid both hanging on to it too long,
but also to avoid multiple reads grabbing the same one. This can happen
for ring mapped buffers, since it hasn't necessarily been committed.
Commit 8ff978b8b222 ("ipv4/raw: support binding to nonlocal addresses")
introduced a helper function to fold duplicated validity checks of bind
addresses into inet_addr_valid_or_nonlocal(). However, this caused an
unintended regression in ping_check_bind_addr(), which previously would
reject binding to multicast and broadcast addresses, but now these are
both incorrectly allowed as reported in [1].
This patch restores the original check. A simple reordering is done to
improve readability and make it evident that multicast and broadcast
addresses should not be allowed. Also, add an early exit for INADDR_ANY
which replaces lost behavior added by commit 0ce779a9f501 ("net: Avoid
unnecessary inet_addr_type() call when addr is INADDR_ANY").
Furthermore, this patch introduces regression selftests to catch these
specific cases.
Xu Jia [Fri, 17 Jun 2022 09:31:06 +0000 (17:31 +0800)]
hamradio: 6pack: fix array-index-out-of-bounds in decode_std_command()
Hulk Robot reports incorrect sp->rx_count_cooked value in decode_std_command().
This should be caused by the subtracting from sp->rx_count_cooked before.
It seems that sp->rx_count_cooked value is changed to 0, which bypassed the
previous judgment.
The situation is shown below:
(Thread 1) | (Thread 2)
decode_std_command() | resync_tnc()
... |
if (rest == 2) |
sp->rx_count_cooked -= 2; |
else if (rest == 3) | ...
| sp->rx_count_cooked = 0;
sp->rx_count_cooked -= 1; |
for (i = 0; i < sp->rx_count_cooked; i++) // report error
checksum += sp->cooked_buf[i];
sp->rx_count_cooked is a shared variable but is not protected by a lock.
The same applies to sp->rx_count. This patch adds a lock to fix the bug.
The fail log is shown below:
=======================================================================
UBSAN: array-index-out-of-bounds in drivers/net/hamradio/6pack.c:925:31
index 400 is out of range for type 'unsigned char [400]'
CPU: 3 PID: 7433 Comm: kworker/u10:1 Not tainted 5.18.0-rc5-00163-g4b97bac0756a #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound flush_to_ldisc
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
ubsan_epilogue+0xb/0x50
__ubsan_handle_out_of_bounds.cold+0x62/0x6c
sixpack_receive_buf+0xfda/0x1330
tty_ldisc_receive_buf+0x13e/0x180
tty_port_default_receive_buf+0x6d/0xa0
flush_to_ldisc+0x213/0x3f0
process_one_work+0x98f/0x1620
worker_thread+0x665/0x1080
kthread+0x2e9/0x3a0
ret_from_fork+0x1f/0x30
...
Hoang Le [Fri, 17 Jun 2022 01:45:51 +0000 (08:45 +0700)]
tipc: fix use-after-free Read in tipc_named_reinit
syzbot found the following issue on:
==================================================================
BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
net/tipc/name_distr.c:413
Read of size 8 at addr ffff88805299a000 by task kworker/1:9/23764
In the commit d966ddcc3821 ("tipc: fix a deadlock when flushing scheduled work"),
the cancel_work_sync() function just to make sure ONLY the work
tipc_net_finalize_work() is executing/pending on any CPU completed before
tipc namespace is destroyed through tipc_exit_net(). But this function
is not guaranteed the work is the last queued. So, the destroyed instance
may be accessed in the work which will try to enqueue later.
In order to completely fix, we re-order the calling of cancel_work_sync()
to make sure the work tipc_net_finalize_work() was last queued and it
must be completed by calling cancel_work_sync().
Jay Vosburgh [Thu, 16 Jun 2022 19:26:30 +0000 (12:26 -0700)]
veth: Add updating of trans_start
Since commit 21a75f0915dd ("bonding: Fix ARP monitor validation"),
the bonding ARP / ND link monitors depend on the trans_start time to
determine link availability. NETIF_F_LLTX drivers must update trans_start
directly, which veth does not do. This prevents use of the ARP or ND link
monitors with veth interfaces in a bond.
Resolve this by having veth_xmit update the trans_start time.
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 20423 Comm: udevd Tainted: G W 5.19.0-rc2-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
ALSA: x86: intel_hdmi_audio: enable pm_runtime and set autosuspend delay
The existing code uses pm_runtime_get_sync/put_autosuspend, but
pm_runtime was not explicitly enabled. The autosuspend delay was not
set either, the value is set to 5s since HDMI is rather painful to
resume.
ALSA: hda: intel-dspcfg: use SOF for UpExtreme and UpExtreme11 boards
The UpExtreme BIOS reports microphones that are not physically
present, so this module ends-up selecting SOF, while the UpExtreme11
BIOS does not report microphones so the snd-hda-intel driver is
selected.
For consistency use SOF unconditionally in autodetection mode. The use
of the snd-hda-intel driver can still be enabled with
'options snd-intel-dspcfg dsp_driver=1'
Linus Torvalds [Fri, 17 Jun 2022 04:39:51 +0000 (21:39 -0700)]
Merge tag 'drm-fixes-2022-06-17' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular drm fixes for rc3. Nothing too serious, i915, amdgpu and
exynos all have a few small driver fixes, and two ttm fixes, and one
compiler warning.
i915:
- Fix page fault on error state read
- Fix memory leaks in per-gt sysfs
- Fix multiple fence handling
- Remove accidental static from a local variable
exynos:
- Check a null pointer instead of IS_ERR()
- Rework initialization code of Exynos MIC driver"
* tag 'drm-fixes-2022-06-17' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Cap OLED brightness per max frame-average luminance
drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl
drm/exynos: mic: Rework initialization
drm/exynos: fix IS_ERR() vs NULL check in probe
drm/ttm: fix bulk move handling v2
drm/i915/uc: remove accidental static from a local variable
drm/i915: Individualize fences before adding to dma_resv obj
drm/i915/gt: Fix memory leaks in per-gt sysfs
drm/i915/reset: Fix error_state_read ptr + offset use
drm/ttm: fix missing NULL check in ttm_device_swapout
drm/atomic: fix warning of unused variable
Claudiu Manoil [Fri, 10 Jun 2022 08:40:37 +0000 (11:40 +0300)]
phy: aquantia: Fix AN when higher speeds than 1G are not advertised
Even when the eth port is resticted to work with speeds not higher than 1G,
and so the eth driver is requesting the phy (via phylink) to advertise up
to 1000BASET support, the aquantia phy device is still advertising for 2.5G
and 5G speeds.
Clear these advertising defaults when requested.
Merge branch 'bpf: Fix cookie values for kprobe multi'
Jiri Olsa says:
====================
hi,
there's bug in kprobe_multi link that makes cookies misplaced when
using symbols to attach. The reason is that we sort symbols by name
but not adjacent cookie values. Current test did not find it because
bpf_fentry_test* are already sorted by name.
v3 changes:
- fixed kprobe_multi bench test to filter out invalid entries
from available_filter_functions
v2 changes:
- rebased on top of bpf/master
- checking if cookies are defined later in swap function [Andrii]
- added acks
Jiri Olsa [Wed, 15 Jun 2022 11:21:17 +0000 (13:21 +0200)]
bpf: Force cookies array to follow symbols sorting
When user specifies symbols and cookies for kprobe_multi link
interface it's very likely the cookies will be misplaced and
returned to wrong functions (via get_attach_cookie helper).
The reason is that to resolve the provided functions we sort
them before passing them to ftrace_lookup_symbols, but we do
not do the same sort on the cookie values.
Fixing this by using sort_r function with custom swap callback
that swaps cookie values as well.
Jiri Olsa [Wed, 15 Jun 2022 11:21:15 +0000 (13:21 +0200)]
selftests/bpf: Shuffle cookies symbols in kprobe multi test
There's a kernel bug that causes cookies to be misplaced and
the reason we did not catch this with this test is that we
provide bpf_fentry_test* functions already sorted by name.
Shuffling function bpf_fentry_test2 deeper in the list and
keeping the current cookie values as before will trigger
the bug.
Tyrel Datwyler [Thu, 16 Jun 2022 19:11:25 +0000 (12:11 -0700)]
scsi: ibmvfc: Store vhost pointer during subcrq allocation
Currently the back pointer from a queue to the vhost adapter isn't set
until after subcrq interrupt registration. The value is available when a
queue is first allocated and can/should be also set for primary and async
queues as well as subcrqs.
This fixes a crash observed during kexec/kdump on Power 9 with legacy XICS
interrupt controller where a pending subcrq interrupt from the previous
kernel can be replayed immediately upon IRQ registration resulting in
dereference of a garbage backpointer in ibmvfc_interrupt_scsi().
Tyrel Datwyler [Thu, 16 Jun 2022 19:11:26 +0000 (12:11 -0700)]
scsi: ibmvfc: Allocate/free queue resource only during probe/remove
Currently, the sub-queues and event pool resources are allocated/freed for
every CRQ connection event such as reset and LPM. This exposes the driver
to a couple issues. First the inefficiency of freeing and reallocating
memory that can simply be resued after being sanitized. Further, a system
under memory pressue runs the risk of allocation failures that could result
in a crippled driver. Finally, there is a race window where command
submission/compeletion can try to pull/return elements from/to an event
pool that is being deleted or already has been deleted due to the lack of
host state around freeing/allocating resources. The following is an example
of list corruption following a live partition migration (LPM):
Saurabh Sengar [Tue, 14 Jun 2022 07:05:55 +0000 (00:05 -0700)]
scsi: storvsc: Correct reporting of Hyper-V I/O size limits
Current code is based on the idea that the max number of SGL entries
also determines the max size of an I/O request. While this idea was
true in older versions of the storvsc driver when SGL entry length
was limited to 4 Kbytes, commit 3d9c3dcc58e9 ("scsi: storvsc: Enable
scatterlist entry lengths > 4Kbytes") removed that limitation. It's
now theoretically possible for the block layer to send requests that
exceed the maximum size supported by Hyper-V. This problem doesn't
currently happen in practice because the block layer defaults to a
512 Kbyte maximum, while Hyper-V in Azure supports 2 Mbyte I/O sizes.
But some future configuration of Hyper-V could have a smaller max I/O
size, and the block layer could exceed that max.
Fix this by correctly setting max_sectors as well as sg_tablesize to
reflect the maximum I/O size that Hyper-V reports. While allowing
I/O sizes larger than the block layer default of 512 Kbytes doesn’t
provide any noticeable performance benefit in the tests we ran, it's
still appropriate to report the correct underlying Hyper-V capabilities
to the Linux block layer.
Also tweak the virt_boundary_mask to reflect that the required
alignment derives from Hyper-V communication using a 4 Kbyte page size,
and not on the guest page size, which might be bigger (eg. ARM64).
Bart Van Assche [Mon, 13 Jun 2022 21:44:42 +0000 (14:44 -0700)]
scsi: ufs: Fix a race between the interrupt handler and the reset handler
Prevent that both the interrupt handler and the reset handler try to
complete a request at the same time. This patch is the result of an
analysis of the following crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000120
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE 5.10.107-android13-4-00051-g1e48e8970cca-ab8664745 #1
pc : ufshcd_release_scsi_cmd+0x30/0x46c
lr : __ufshcd_transfer_req_compl+0x4fc/0x9c0
Call trace:
ufshcd_release_scsi_cmd+0x30/0x46c
__ufshcd_transfer_req_compl+0x4fc/0x9c0
ufshcd_poll+0xf0/0x208
ufshcd_sl_intr+0xb8/0xf0
ufshcd_intr+0x168/0x2f4
__handle_irq_event_percpu+0xa0/0x30c
handle_irq_event+0x84/0x178
handle_fasteoi_irq+0x150/0x2e8
__handle_domain_irq+0x114/0x1e4
gic_handle_irq.31846+0x58/0x300
el1_irq+0xe4/0x1c0
cpuidle_enter_state+0x3ac/0x8c4
do_idle+0x2fc/0x55c
cpu_startup_entry+0x84/0x90
kernel_init+0x0/0x310
start_kernel+0x0/0x608
start_kernel+0x4ec/0x608
Bart Van Assche [Mon, 13 Jun 2022 21:44:41 +0000 (14:44 -0700)]
scsi: ufs: Support clearing multiple commands at once
Modify ufshcd_clear_cmd() such that it supports clearing multiple commands
at once instead of one command at a time. This change will be used in a
later patch to reduce the time spent in the reset handler.
Dave Airlie [Fri, 17 Jun 2022 00:24:42 +0000 (10:24 +1000)]
Merge tag 'drm-intel-fixes-2022-06-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.19-rc3:
- Fix page fault on error state read
- Fix memory leaks in per-gt sysfs
- Fix multiple fence handling
- Remove accidental static from a local variable
Mikulas Patocka [Thu, 16 Jun 2022 17:28:57 +0000 (13:28 -0400)]
dm mirror log: round up region bitmap size to BITS_PER_LONG
The code in dm-log rounds up bitset_size to 32 bits. It then uses
find_next_zero_bit_le on the allocated region. find_next_zero_bit_le
accesses the bitmap using unsigned long pointers. So, on 64-bit
architectures, it may access 4 bytes beyond the allocated size.
Fix this bug by rounding up bitset_size to BITS_PER_LONG.
This bug was found by running the lvm2 testsuite with kasan.
Mikulas Patocka [Thu, 16 Jun 2022 18:14:39 +0000 (14:14 -0400)]
dm: fix narrow race for REQ_NOWAIT bios being issued despite no support
Starting with the commit 63a225c9fd20, device mapper has an optimization
that it will take cheaper table lock (dm_get_live_table_fast instead of
dm_get_live_table) if the bio has REQ_NOWAIT. The bios with REQ_NOWAIT
must not block in the target request routine, if they did, we would be
blocking while holding rcu_read_lock, which is prohibited.
The targets that are suitable for REQ_NOWAIT optimization (and that don't
block in the map routine) have the flag DM_TARGET_NOWAIT set. Device
mapper will test if all the targets and all the devices in a table
support nowait (see the function dm_table_supports_nowait) and it will set
or clear the QUEUE_FLAG_NOWAIT flag on its request queue according to
this check.
There's a test in submit_bio_noacct: "if ((bio->bi_opf & REQ_NOWAIT) &&
!blk_queue_nowait(q)) goto not_supported" - this will make sure that
REQ_NOWAIT bios can't enter a request queue that doesn't support them.
This mechanism works to prevent REQ_NOWAIT bios from reaching dm targets
that don't support the REQ_NOWAIT flag (and that may block in the map
routine) - except that there is a small race condition:
submit_bio_noacct checks if the queue has the QUEUE_FLAG_NOWAIT without
holding any locks. Immediatelly after this check, the device mapper table
may be reloaded with a table that doesn't support REQ_NOWAIT (for example,
if we start moving the logical volume or if we activate a snapshot).
However the REQ_NOWAIT bio that already passed the check in
submit_bio_noacct would be sent to device mapper, where it could be
redirected to a dm target that doesn't support REQ_NOWAIT - the result is
sleeping while we hold rcu_read_lock.
In order to fix this race, we double-check if the target supports
REQ_NOWAIT while we hold the table lock (so that the table can't change
under us).
Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
Mikulas Patocka [Thu, 16 Jun 2022 17:21:27 +0000 (13:21 -0400)]
dm: fix use-after-free in dm_put_live_table_bio
dm_put_live_table_bio is called from the end of dm_submit_bio.
However, at this point, the bio may be already finished and the caller
may have freed the bio. Consequently, dm_put_live_table_bio accesses
the stale "bio" pointer.
Fix this bug by loading the bi_opf value and passing it to
dm_get_live_table_bio and dm_put_live_table_bio instead of the bio.
This bug was found by running the lvm2 testsuite with kasan.
Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio") Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
Dave Airlie [Thu, 16 Jun 2022 23:31:22 +0000 (09:31 +1000)]
Merge tag 'drm-misc-fixes-2022-06-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for TTM, one for a NULL pointer dereference and one to make sure
the buffer is pinned prior to a bulk move, and a fix for a spurious
compiler warning.
Steve French [Thu, 16 Jun 2022 03:40:23 +0000 (22:40 -0500)]
smb3: add trace point for SMB2_set_eof
In order to debug problems with file size being reported incorrectly
temporarily (in this case xfstest generic/584 intermittent failure)
we need to add trace point for the non-compounded code path where
we set the file size (SMB2_set_eof). The new trace point is:
"smb3_set_eof"
Joel Savitz [Thu, 9 Jun 2022 20:32:17 +0000 (16:32 -0400)]
selftests: make use of GUP_TEST_FILE macro
Commit 17de1e559cf1 ("selftests: clarify common error when running
gup_test") had most of its hunks dropped due to a conflict with another
patch accepted into Linux around the same time that implemented the same
behavior as a subset of other changes.
However, the remaining hunk defines the GUP_TEST_FILE macro without
making use of it. This patch makes use of the macro in the two relevant
places.
Furthermore, the above mentioned commit's log message erroneously describes
the changes that were dropped from the patch.
Bart Van Assche [Mon, 13 Jun 2022 16:32:34 +0000 (09:32 -0700)]
block/bfq: Enable I/O statistics
BFQ uses io_start_time_ns. That member variable is only set if I/O
statistics are enabled. Hence this patch that enables I/O statistics
at the time BFQ is associated with a request queue.
Linus Torvalds [Thu, 16 Jun 2022 22:53:38 +0000 (15:53 -0700)]
Merge tag 'audit-pr-20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single audit patch to fix a problem where we were not properly
freeing memory allocated when recording information related to a
module load"
* tag 'audit-pr-20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: free module name
Linus Torvalds [Thu, 16 Jun 2022 22:50:36 +0000 (15:50 -0700)]
Merge tag 'selinux-pr-20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A single SELinux patch to fix memory leaks when mounting filesystems
with SELinux mount options"
* tag 'selinux-pr-20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: free contexts previously transferred in selinux_add_opt()
Heiko Stuebner [Thu, 26 May 2022 20:56:45 +0000 (22:56 +0200)]
riscv: fix dependency for t-head errata
alternatives only work correctly on non-xip-kernels and while the
selected alternative-symbol has the correct dependency the symbol
selecting it also needs that dependency.
So add the missing dependency to the T-Head errata Kconfig symbol.
Palmer Dabbelt [Thu, 16 Jun 2022 22:13:10 +0000 (15:13 -0700)]
Merge tag 'dt-fixes-for-palmer-5.19-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/conor/linux into fixes
Microchip RISC-V devicetree fixes for 5.19-rc3
A single fix for mpfs.dtsi:
- The sifive pdma entry fell through the cracks between versions of my
dt patches & I gave Zong the wrong conflict resolution, so it is
added back.
* tag 'dt-fixes-for-palmer-5.19-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: microchip: re-add pdma to mpfs device tree
cached operations sometimes need to do invalid operations (e.g. read
on a write only file)
Historic fscache had added a "writeback fid", a special handle opened
RW as root, for this. The conversion to new fscache missed that bit.
This commit reinstates a slightly lesser variant of the original code
that uses the writeback fid for partial pages backfills if the regular
user fid had been open as WRONLY, and thus would lack read permissions.
Ming Lei [Thu, 16 Jun 2022 01:44:01 +0000 (09:44 +0800)]
blk-mq: don't clear flush_rq from tags->rqs[]
commit 364b61818f65 ("blk-mq: clearing flush request reference in
tags->rqs[]") is added to clear the to-be-free flush request from
tags->rqs[] for avoiding use-after-free on the flush rq.
Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
by ~8s because running scsi probe which may create and remove lots of
unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
each request queue has lots of hw queues.
Improve the situation by not running blk_mq_clear_flush_rq_mapping if
disk isn't added when there can't be any flush request issued.
Ming Lei [Thu, 16 Jun 2022 01:44:00 +0000 (09:44 +0800)]
blk-mq: avoid to touch q->elevator without any protection
q->elevator is referred in blk_mq_has_sqsched() without any protection,
no .q_usage_counter is held, no queue srcu and rcu read lock is held,
so potential use-after-free may be triggered.
Fix the issue by adding one queue flag for checking if the elevator
uses single queue style dispatch. Meantime the elevator feature flag
of ELEVATOR_F_MQ_AWARE isn't needed any more.
Ming Lei [Thu, 16 Jun 2022 01:43:59 +0000 (09:43 +0800)]
blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
elevator can be tore down by sysfs switch interface or disk release, so
hold ->sysfs_lock before referring to q->elevator, then potential
use-after-free can be avoided.
Bart Van Assche [Wed, 15 Jun 2022 21:00:04 +0000 (14:00 -0700)]
block: Fix handling of offline queues in blk_mq_alloc_request_hctx()
This patch prevents that test nvme/004 triggers the following:
UBSAN: array-index-out-of-bounds in block/blk-mq.h:135:9
index 512 is out of range for type 'long unsigned int [512]'
Call Trace:
show_stack+0x52/0x58
dump_stack_lvl+0x49/0x5e
dump_stack+0x10/0x12
ubsan_epilogue+0x9/0x3b
__ubsan_handle_out_of_bounds.cold+0x44/0x49
blk_mq_alloc_request_hctx+0x304/0x310
__nvme_submit_sync_cmd+0x70/0x200 [nvme_core]
nvmf_connect_io_queue+0x23e/0x2a0 [nvme_fabrics]
nvme_loop_connect_io_queues+0x8d/0xb0 [nvme_loop]
nvme_loop_create_ctrl+0x58e/0x7d0 [nvme_loop]
nvmf_create_ctrl+0x1d7/0x4d0 [nvme_fabrics]
nvmf_dev_write+0xae/0x111 [nvme_fabrics]
vfs_write+0x144/0x560
ksys_write+0xb7/0x140
__x64_sys_write+0x42/0x50
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Jakub Sitnicki [Thu, 16 Jun 2022 16:20:37 +0000 (18:20 +0200)]
selftests/bpf: Test tail call counting with bpf2bpf and data on stack
Cover the case when tail call count needs to be passed from BPF function to
BPF function, and the caller has data on stack. Specifically when the size
of data allocated on BPF stack is not a multiple on 8.
On x86-64 the tail call count is passed from one BPF function to another
through %rax. Additionally, on function entry, the tail call count value
is stored on stack right after the BPF program stack, due to register
shortage.
The stored count is later loaded from stack either when performing a tail
call - to check if we have not reached the tail call limit - or before
calling another BPF function call in order to pass it via %rax.
In the latter case, we miscalculate the offset at which the tail call count
was stored on function entry. The JIT does not take into account that the
allocated BPF program stack is always a multiple of 8 on x86, while the
actual stack depth does not have to be.
This leads to a load from an offset that belongs to the BPF stack, as shown
in the example below:
SEC("tc")
int entry(struct __sk_buff *skb)
{
/* Have data on stack which size is not a multiple of 8 */
volatile char arr[1] = {};
return subprog_tail(skb);
}
Linus Torvalds [Thu, 16 Jun 2022 18:51:32 +0000 (11:51 -0700)]
Merge tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Mostly driver fixes.
Current release - regressions:
- Revert "net: Add a second bind table hashed by port and address",
needs more work
- amd-xgbe: use platform_irq_count(), static setup of IRQ resources
had been removed from DT core
- dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
Current release - new code bugs:
- hns3: modify the ring param print info
Previous releases - always broken:
- axienet: make the 64b addressable DMA depends on 64b architectures
- iavf: fix issue with MAC address of VF shown as zero
- ice: fix PTP TX timestamp offset calculation
- usb: ax88179_178a needs FLAG_SEND_ZLP
Misc:
- document some net.sctp.* sysctls"
* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
net: axienet: add missing error return code in axienet_probe()
Revert "net: Add a second bind table hashed by port and address"
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
net: usb: ax88179_178a needs FLAG_SEND_ZLP
MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
ARM: dts: at91: ksz9477_evb: fix port/phy validation
net: bgmac: Fix an erroneous kfree() in bgmac_remove()
ice: Fix memory corruption in VF driver
ice: Fix queue config fail handling
ice: Sync VLAN filtering features for DVM
ice: Fix PTP TX timestamp offset calculation
mlxsw: spectrum_cnt: Reorder counter pools
docs: networking: phy: Fix a typo
amd-xgbe: Use platform_irq_count()
octeontx2-vf: Add support for adaptive interrupt coalescing
xilinx: Fix build on x86.
net: axienet: Use iowrite64 to write all 64b descriptor pointers
net: axienet: make the 64b addresable DMA depends on 64b archectures
net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
net: hns3: fix PF rss size initialization bug
...
Joanne Koong [Wed, 15 Jun 2022 19:32:13 +0000 (12:32 -0700)]
Revert "net: Add a second bind table hashed by port and address"
This reverts:
commit d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
commit 538aaf9b2383 ("selftests: Add test for timing a bind request to a port with a populated bhash entry") Link: https://lore.kernel.org/netdev/[email protected]/
There are a few things that need to be fixed here:
* Updating bhash2 in cases where the socket's rcv saddr changes
* Adding bhash2 hashbucket locks
Mark Brown [Wed, 15 Jun 2022 19:15:04 +0000 (20:15 +0100)]
arm64/cpufeature: Unexport set_cpu_feature()
We currently export set_cpu_feature() to modules but there are no in tree
users that can be built as modules and it is hard to see cases where it
would make sense for there to be any such users. Remove the export to avoid
anyone else having to worry about why it is there and ensure that any users
that do get added get a bit more visiblity.
Jan Kara [Fri, 20 May 2022 11:14:02 +0000 (13:14 +0200)]
ext4: improve write performance with disabled delalloc
When delayed allocation is disabled (either through mount option or
because we are running low on free space), ext4_write_begin() allocates
blocks with EXT4_GET_BLOCKS_IO_CREATE_EXT flag. With this flag extent
merging is disabled and since ext4_write_begin() is called for each page
separately, we end up with a *lot* of 1 block extents in the extent tree
and following writeback is writing 1 block at a time which results in
very poor write throughput (4 MB/s instead of 200 MB/s). These days when
ext4_get_block_unwritten() is used only by ext4_write_begin(),
ext4_page_mkwrite() and inline data conversion, we can safely allow
extent merging to happen from these paths since following writeback will
happen on different boundaries anyway. So use
EXT4_GET_BLOCKS_CREATE_UNRIT_EXT instead which restores the performance.
Zhang Yi [Fri, 20 May 2022 02:32:16 +0000 (10:32 +0800)]
ext4: fix warning when submitting superblock in ext4_commit_super()
We have already check the io_error and uptodate flag before submitting
the superblock buffer, and re-set the uptodate flag if it has been
failed to write out. But it was lockless and could be raced by another
ext4_commit_super(), and finally trigger '!uptodate' WARNING when
marking buffer dirty. Fix it by submit buffer directly.
Dylan Yudaken [Thu, 16 Jun 2022 13:50:11 +0000 (06:50 -0700)]
io_uring: do not use prio task_work_add in uring_cmd
io_req_task_prio_work_add has a strict assumption that it will only be
used with io_req_task_complete. There is a codepath that assumes this is
the case and will not even call the completion function if it is hit.
For uring_cmd with an arbitrary completion function change the call to the
correct non-priority version.
Add the description of @folio and remove @page in function kernel-doc
comment to remove warnings found by running scripts/kernel-doc, which
is caused by using 'make W=1'.
fs/jbd2/transaction.c:2149: warning: Function parameter or member
'folio' not described in 'jbd2_journal_try_to_free_buffers'
fs/jbd2/transaction.c:2149: warning: Excess function parameter 'page'
description in 'jbd2_journal_try_to_free_buffers'
For recv/recvmsg, IO either completes immediately or gets queued for a
retry. This isn't the case for read/readv, if eg a normal file or a block
device is used. Here, an operation can get queued with the block layer.
If this happens, ring mapped buffers must get committed immediately to
avoid that the next read can consume the same buffer.
Check if we're dealing with pollable file, when getting a new ring mapped
provided buffer. If it's not, commit it immediately rather than wait post
issue. If we don't wait, we can race with completions coming in, or just
plain buffer reuse by committing after a retry where others could have
grabbed the same buffer.
Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers") Reviewed-by: Hao Xu <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
Jan Kara [Wed, 15 Jun 2022 13:22:29 +0000 (15:22 +0200)]
init: Initialize noop_backing_dev_info early
noop_backing_dev_info is used by superblocks of various
pseudofilesystems such as kdevtmpfs. After commit 10e14073107d
("writeback: Fix inode->i_io_list not be protected by inode->i_lock
error") this broke because __mark_inode_dirty() started to access more
fields from noop_backing_dev_info and this led to crashes inside
locked_inode_to_wb_and_lock_list() called from __mark_inode_dirty().
Fix the problem by initializing noop_backing_dev_info before the
filesystems get mounted.
Fixes: 10e14073107d ("writeback: Fix inode->i_io_list not be protected by inode->i_lock error") Reported-and-tested-by: Suzuki K Poulose <[email protected]> Reported-and-tested-by: Alexandru Elisei <[email protected]> Reported-and-tested-by: Guenter Roeck <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jan Kara <[email protected]>
Ye Bin [Wed, 15 Jun 2022 09:00:10 +0000 (17:00 +0800)]
ext2: fix fs corruption when trying to remove a non-empty directory with IO error
We got issue as follows:
[home]# mount /dev/sdd test
[home]# cd test
[test]# ls
dir1 lost+found
[test]# rmdir dir1
ext2_empty_dir: inject fault
[test]# ls
lost+found
[test]# cd ..
[home]# umount test
[home]# fsck.ext2 -fn /dev/sdd
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Inode 4065, i_size is 0, should be 1024. Fix? no
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Unconnected directory inode 4065 (/???)
Connect to /lost+found? no
'..' in ... (4065) is / (2), should be <The NULL inode> (0).
Fix? no
Pass 4: Checking reference counts
Inode 2 ref count is 3, should be 4. Fix? no
Inode 4065 ref count is 2, should be 3. Fix? no
Pass 5: Checking group summary information
/dev/sdd: ********** WARNING: Filesystem still has errors **********
Darrick J. Wong [Mon, 6 Jun 2022 01:51:22 +0000 (18:51 -0700)]
xfs: fix variable state usage
The variable @args is fed to a tracepoint, and that's the only place
it's used. This is fine for the kernel, but for userspace, tracepoints
are #define'd out of existence, which results in this warning on gcc
11.2:
xfs_attr.c: In function ‘xfs_attr_node_try_addname’:
xfs_attr.c:1440:42: warning: unused variable ‘args’ [-Wunused-variable]
1440 | struct xfs_da_args *args = attr->xattri_da_args;
| ^~~~
This isn't a particularly severe problem right now because xattr logging
is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
what they're doing.
However, the eventual intent is that callers should be able to ask for
the assistance of the log in persisting xattr updates. This capability
might not be required for /all/ callers, which means that dynamic
control must work correctly. Once an xattr update has decided whether
or not to use logged xattrs, it needs to stay in that mode until the end
of the operation regardless of what subsequent parallel operations might
do.
Therefore, it is an error to continue sampling xfs_globals.larp once
xfs_attr_change has made a decision about larp, and it was not correct
for me to have told Allison that ->create_intent functions can sample
the global log incompat feature bitfield to decide to elide a log item.
Instead, create a new op flag for the xfs_da_args structure, and convert
all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
the attr update state machine to look for the operations flag.
Cc: [email protected] Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls") Signed-off-by: Christian Göttsche <[email protected]> Signed-off-by: Paul Moore <[email protected]>
Linus Torvalds [Wed, 15 Jun 2022 21:20:26 +0000 (14:20 -0700)]
Merge tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Correctly handle vm_map areas in hardened usercopy (Matthew Wilcox)
- Adjust CFI RCU usage to avoid boot splats with cpuidle (Sami Tolvanen)
* tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usercopy: Make usercopy resilient against ridiculously large copies
usercopy: Cast pointer to an integer once
usercopy: Handle vm_map_ram() areas
cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle
Petr Mladek [Wed, 15 Jun 2022 16:28:05 +0000 (18:28 +0200)]
printk: Wait for the global console lock when the system is going down
There are reports that the console kthreads block the global console
lock when the system is going down, for example, reboot, panic.
First part of the solution was to block kthreads in these problematic
system states so they stopped handling newly added messages.
Second part of the solution is to wait when for the kthreads when
they are actively printing. It solves the problem when a message
was printed before the system entered the problematic state and
the kthreads managed to step in.
A busy waiting has to be used because panic() can be called in any
context and in an unknown state of the scheduler.
There must be a timeout because the kthread might get stuck or sleeping
and never release the lock. The timeout 10s is an arbitrary value
inspired by the softlockup timeout.