Git Repo - linux.git/log

io_uring/net: correctly handle multishot recvmsg retry setup

If we loop for multishot receive on the initial attempt, and then abort
later on to wait for more, we miss a case where we should be copying the
io_async_msghdr from the stack to stable storage. This leads to the next
retry potentially failing, if the application had the msghdr on the
stack.

Cc: [email protected]
Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg")
Signed-off-by: Jens Axboe <[email protected]>

scripts/gdb/symbols: fix invalid escape sequence warning

With python 3.12, '\.' results in this warning
SyntaxWarning: invalid escape sequence '\.'

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrew Ballance <[email protected]>
Cc: Jan Kiszka <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Koudai Iwahori <[email protected]>
Cc: Kuan-Ying Lee <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Pankaj Raghav <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Input: synaptics-rmi4 - fix UAF of IRQ domain on driver removal

Calling irq_domain_remove() will lead to freeing the IRQ domain
prematurely. The domain is still referenced and will be attempted to get
used via rmi_free_function_list() -> rmi_unregister_function() ->
irq_dispose_mapping() -> irq_get_irq_data()'s ->domain pointer.

With PaX's MEMORY_SANITIZE this will lead to an access fault when
attempting to dereference embedded pointers, as in Torsten's report that
was faulting on the 'domain->ops->unmap' test.

Fix this by releasing the IRQ domain only after all related IRQs have
been deactivated.

Fixes: 24d28e4f1271 ("Input: synaptics-rmi4 - convert irq distribution to irq_domain")
Reported-by: Torsten Hilbrich <[email protected]>
Signed-off-by: Mathias Krause <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>

io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler

This flag should not be persistent across retries, so ensure we clear
it before potentially attemting a retry.

Fixes: c3f9109dbc9e ("io_uring/kbuf: flag request if buffer pool is empty after buffer pick")
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'spi-fix-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
"One small fix for the newly added cs42l43 driver which would have
  caused it problems working in some system configurations by needlessly
  restricting chip select configurations"

* tag 'spi-fix-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: cs42l43: Don't limit native CS to the first chip select

Merge tag 'regulator-fix-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of small fixes for the rk808 driver, the regulator voltage
  configurations were incorrectly described.

  The changes are not expected to have practical impact but given that
  we're dealing with power it's generally better to follow the hardware
  specification as closely as we can to avoid unexpected stresses"

* tag 'regulator-fix-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: rk808: fix LDO range on RK806
  regulator: rk808: fix buck range on RK806

cdrom: gdrom: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

io_uring: fix io_queue_proc modifying req->flags

With multiple poll entries __io_queue_proc() might be running in
parallel with poll handlers and possibly task_work, we should not be
carelessly modifying req->flags there. io_poll_double_prepare() handles
a similar case with locking but it's much easier to move it into
__io_arm_poll_handler().

Cc: [email protected]
Fixes: 595e52284d24a ("io_uring/poll: don't enable lazy wake for POLLEXCLUSIVE")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/455cc49e38cf32026fa1b49670be8c162c2cb583.1709834755.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Will Deacon:
"A lonely arm64 fix addressing a kprobes regression that we introduced
  during the merge window:

   - Fix recursive kprobes regression when probing the stack unwinder"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: prohibit probing on arch_kunwind_consume_entry()

Merge tag 'erofs-for-6.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
"The main one is a KMSAN fix which addresses an issue introduced in
  this cycle so it'd be much better to fix before releasing, and the
  remaining one fixes VMA alignment for THP.

  Summary:

   - Fix a KMSAN uninit-value issue triggered by a crafted image

   - Fix VMA alignment for memory mapped files on THP"

* tag 'erofs-for-6.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: apply proper VMA alignment for memory mapped files on THP
  erofs: fix uninitialized page cache reported by KMSAN

Merge tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, ipsec and netfilter.

  No solution yet for the stmmac issue mentioned in the last PR, but it
  proved to be a lockdep false positive, not a blocker.

  Current release - regressions:

   - dpll: move all dpll<>netdev helpers to dpll code, fix build
     regression with old compilers

  Current release - new code bugs:

   - page_pool: fix netlink dump stop/resume

  Previous releases - regressions:

   - bpf: fix verifier to check bpf_func_state->callback_depth when
     pruning states as otherwise unsafe programs could get accepted

   - ipv6: avoid possible UAF in ip6_route_mpath_notify()

   - ice: reconfig host after changing MSI-X on VF

   - mlx5:
       - e-switch, change flow rule destination checking
       - add a memory barrier to prevent a possible null-ptr-deref
       - switch to using _bh variant of of spinlock where needed

  Previous releases - always broken:

   - netfilter: nf_conntrack_h323: add protection for bmp length out of
     range

   - bpf: fix to zero-initialise xdp_rxq_info struct before running XDP
     program in CPU map which led to random xdp_md fields

   - xfrm: fix UDP encapsulation in TX packet offload

   - netrom: fix data-races around sysctls

   - ice:
       - fix potential NULL pointer dereference in ice_bridge_setlink()
       - fix uninitialized dplls mutex usage

   - igc: avoid returning frame twice in XDP_REDIRECT

   - i40e: disable NAPI right after disabling irqs when handling
     xsk_pool

   - geneve: make sure to pull inner header in geneve_rx()

   - sparx5: fix use after free inside sparx5_del_mact_entry

   - dsa: microchip: fix register write order in ksz8_ind_write8()

  Misc:

   - selftests: mptcp: fixes for diag.sh"

* tag 'net-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  net: pds_core: Fix possible double free in error handling path
  netrom: Fix data-races around sysctl_net_busy_read
  netrom: Fix a data-race around sysctl_netrom_link_fails_count
  netrom: Fix a data-race around sysctl_netrom_routing_control
  netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
  netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
  netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
  netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
  netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
  netrom: Fix a data-race around sysctl_netrom_transport_timeout
  netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
  netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
  netrom: Fix a data-race around sysctl_netrom_default_path_quality
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
  net/rds: fix WARNING in rds_conn_connect_if_down
  net: dsa: microchip: fix register write order in ksz8_ind_write8()
  ...

Merge tag 'nvme-6.9-2024-03-07' of git://git.infradead.org/nvme into for-6.9/block

Pull NVMe updates from Keith:

"nvme updates for Linux 6.9

- RDMA target enhancements (Max)
- Fabrics fixes (Max, Guixin, Hannes)
- Atomic queue_limits usage (Christoph)
- Const use for class_register (Ricardo)
- Identification error handling fixes (Shin'ichiro, Keith)"

* tag 'nvme-6.9-2024-03-07' of git://git.infradead.org/nvme: (31 commits)
  nvme: clear caller pointer on identify failure
  nvme: host: fix double-free of struct nvme_id_ns in ns_update_nuse()
  nvme: fcloop: make fcloop_class constant
  nvme: fabrics: make nvmf_class constant
  nvme: core: constify struct class usage
  nvme-fabrics: typo in nvmf_parse_key()
  nvme-multipath: use atomic queue limits API for stacking limits
  nvme-multipath: pass queue_limits to blk_alloc_disk
  nvme: use the atomic queue limits update API
  nvme: cleanup nvme_configure_metadata
  nvme: don't query identify data in configure_metadata
  nvme: split out a nvme_identify_ns_nvm helper
  nvme: move common logic into nvme_update_ns_info
  nvme: move setting the write cache flags out of nvme_set_queue_limits
  nvme: move a few things out of nvme_update_disk_info
  nvme: don't use nvme_update_disk_info for the multipath disk
  nvme: move blk_integrity_unregister into nvme_init_integrity
  nvme: cleanup the nvme_init_integrity calling conventions
  nvme: move max_integrity_segments handling out of nvme_init_integrity
  nvme: remove nvme_revalidate_zones
  ...

io_uring: fix mshot read defer taskrun cqe posting

We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions
are handled but aux should be explicitly disallowed by opcode handlers.

Cc: [email protected]
Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/6fb7cba6f5366da25f4d3eb95273f062309d97fa.1709740837.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>

net: pds_core: Fix possible double free in error handling path

When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), Callback function pdsc_auxbus_dev_release
calls kfree(padev) to free memory. We shouldn't call kfree(padev)
again in the error handling path.

Fix this by cleaning up the redundant kfree() and putting
the error handling back to where the errors happened.

Fixes: 4569cce43bc6 ("pds_core: add auxiliary_bus devices")
Signed-off-by: Yongzhi Liu <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Reviewed-by: Shannon Nelson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into vfs.misc

Merge case-insensitive updates from Gabriel Krisman Bertazi:

- Patch case-insensitive lookup by trying the case-exact comparison
  first, before falling back to costly utf8 casefolded comparison.

- Fix to forbid using a case-insensitive directory as part of an
  overlayfs mount.

- Patchset to ensure d_op are set at d_alloc time for fscrypt and
  casefold volumes, ensuring filesystem dentries will all have the
  correct ops, whether they come from a lookup or not.

* tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  libfs: Drop generic_set_encrypted_ci_d_ops
  ubifs: Configure dentry operations at dentry-creation time
  f2fs: Configure dentry operations at dentry-creation time
  ext4: Configure dentry operations at dentry-creation time
  libfs: Add helper to choose dentry operations at mount-time
  libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
  fscrypt: Drop d_revalidate once the key is added
  fscrypt: Drop d_revalidate for valid dentries during lookup
  fscrypt: Factor out a helper to configure the lookup dentry
  ovl: Always reject mounting over case-insensitive directories
  libfs: Attempt exact-match comparison first during casefolded lookup

Signed-off-by: Christian Brauner <[email protected]>

x86/fred: Fix init_task thread stack pointer initialization

As TOP_OF_KERNEL_STACK_PADDING was defined as 0 on x86_64, it went
unnoticed that the initialization of the .sp field in INIT_THREAD and some
calculations in the low level startup code do not take the padding into
account.

FRED enabled kernels require a 16 byte padding, which means that the init
task initialization and the low level startup code use the wrong stack
offset.

Subtract TOP_OF_KERNEL_STACK_PADDING in all affected places to adjust for
this.

Fixes: 65c9cc9e2c14 ("x86/fred: Reserve space for the FRED stack frame")
Fixes: 3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Xin Li (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'nf-24-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains fixes for net:

Patch #1 disallows anonymous sets with timeout, except for dynamic sets.
         Anonymous sets with timeouts using the pipapo set backend makes
         no sense from userspace perspective.

Patch #2 rejects constant sets with timeout which has no practical usecase.
         This kind of set, once bound, contains elements that expire but
         no new elements can be added.

Patch #3 restores custom conntrack expectations with NFPROTO_INET,
         from Florian Westphal.

Patch #4 marks rhashtable anonymous set with timeout as dead from the
         commit path to avoid that async GC collects these elements. Rules
         that refers to the anonymous set get released with no mutex held
         from the commit path.

Patch #5 fixes a UBSAN shift overflow in H.323 conntrack helper,
         from Lena Wang.

netfilter pull request 24-03-07

* tag 'nf-24-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_conntrack_h323: Add protection for bmp length out of range
  netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
  netfilter: nft_ct: fix l3num expectations with inet pseudo family
  netfilter: nf_tables: reject constant set with timeout
  netfilter: nf_tables: disallow anonymous set with timeout flag
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

Merge branch 'netrom-fix-all-the-data-races-around-sysctls'

Jason Xing says:

====================
netrom: Fix all the data-races around sysctls

As the title said, in this patchset I fix the data-race issues because
the writer and the reader can manipulate the same value concurrently.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix data-races around sysctl_net_busy_read

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_link_fails_count

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_routing_control

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_transport_busy_delay

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_transport_timeout

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser

We need to protect the reader reading the sysctl value because the
value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser

We need to protect the reader reading the sysctl value
because the value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

netrom: Fix a data-race around sysctl_netrom_default_path_quality

We need to protect the reader reading sysctl_netrom_default_path_quality
because the value can be changed concurrently.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>

drm/tests/buddy: fix print format

This will report a build warning once we have: 806cb2270237 ("kunit:
Annotate _MSG assertion variants with gnu printf specifiers").

Reported-by: Stephen Rothwell <[email protected]>
Fixes: c70703320e55 ("drm/tests/drm_buddy: add alloc_range_bias test")
Signed-off-by: Matthew Auld <[email protected]>
Cc: Arunpravin Paneer Selvam <[email protected]>
Cc: Christian König <[email protected]>
Reviewed-by: Arunpravin Paneer Selvam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Maxime Ripard <[email protected]>

drm/xe: Return immediately on tile_init failure

There's no reason to proceed with applying workaround and initing
sysfs if we are going to abort the probe upon failure.

Fixes: e5a845fd8fa4 ("drm/xe: Add sysfs entry for tile")
Cc: Lucas De Marchi <[email protected]>
Cc: Matt Roper <[email protected]>
Cc: Matthew Auld <[email protected]>
Reviewed-by: Matt Roper <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
(cherry picked from commit af7b93d1d7eeeef674681ddea875be6a29857a5d)
Signed-off-by: Thomas Hellström <[email protected]>

Merge tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2024-03-06

1) Clear the ECN bits flowi4_tos in decode_session4().
   This was already fixed but the bug was reintroduced
   when decode_session4() switched to us the flow dissector.
   From Guillaume Nault.

2) Fix UDP encapsulation in the TX path with packet offload mode.
   From Leon Romanovsky,

3) Avoid clang fortify warning in copy_to_user_tmpl().
   From Nathan Chancellor.

4) Fix inter address family tunnel in packet offload mode.
   From Mike Yu.

* tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: set skb control buffer based on packet offload as well
  xfrm: fix xfrm child route lookup for packet offload
  xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
  xfrm: Pass UDP encapsulation in TX packet offload
  xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-03-06

We've added 5 non-merge commits during the last 1 day(s) which contain
a total of 5 files changed, 77 insertions(+), 4 deletions(-).

The main changes are:

1) Fix BPF verifier to check bpf_func_state->callback_depth when pruning
   states as otherwise unsafe programs could get accepted,
   from Eduard Zingerman.

2) Fix to zero-initialise xdp_rxq_info struct before running XDP program in
   CPU map which led to random xdp_md fields, from Toke Høiland-Jørgensen.

3) Fix bonding XDP feature flags calculation when bonding device has no
   slave devices anymore, from Daniel Borkmann.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
  selftests/bpf: Fix up xdp bonding test wrt feature flags
  xdp, bonding: Fix feature flags when there are no slave devs anymore
  selftests/bpf: test case for callback_depth states pruning logic
  bpf: check bpf_func_state->callback_depth when pruning states
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>

erofs: apply proper VMA alignment for memory mapped files on THP

There are mainly two reasons that thp_get_unmapped_area() should be
used for EROFS as other filesystems:

- It's needed to enable PMD mappings as a FSDAX filesystem, see
   commit 74d2fad1334d ("thp, dax: add thp_get_unmapped_area for pmd
   mappings");

- It's useful together with large folios and
   CONFIG_READ_ONLY_THP_FOR_FS which enable THPs for mmapped files
   (e.g. shared libraries) even without FSDAX.  See commit 1854bc6e2420
   ("mm/readahead: Align file mappings for non-DAX").

Fixes: 06252e9ce05b ("erofs: dax support for non-tailpacking regular file")
Fixes: ce529cc25b18 ("erofs: enable large folios for iomap mode")
Fixes: e6687b89225e ("erofs: enable large folios for fscache mode")
Reviewed-by: Jingbo Xu <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

erofs: fix uninitialized page cache reported by KMSAN

syzbot reports a KMSAN reproducer [1] which generates a crafted
filesystem image and causes IMA to read uninitialized page cache.

Later, (rq->outputsize > rq->inputsize) will be formally supported
after either large uncompressed pclusters (> block size) or big
lclusters are landed. However, currently there is no way to generate
such filesystems by using mkfs.erofs.

Thus, let's mark this condition as unsupported for now.

[1] https://lore.kernel.org/r/0000000000002be12a0611ca7ff8@google.com

Reported-and-tested-by: [email protected]
Fixes: 1ca01520148a ("erofs: refine z_erofs_transform_plain() for sub-page block support")
Reviewed-by: Sandeep Dhavale <[email protected]>
Reviewed-by: Yue Hu <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

netfilter: nf_conntrack_h323: Add protection for bmp length out of range

UBSAN load reports an exception of BRK#5515 SHIFT_ISSUE:Bitwise shifts
that are out of bounds for their data type.

vmlinux   get_bitmap(b=75) + 712
<net/netfilter/nf_conntrack_h323_asn1.c:0>
vmlinux   decode_seq(bs=0xFFFFFFD008037000, f=0xFFFFFFD008037018, level=134443100) + 1956
<net/netfilter/nf_conntrack_h323_asn1.c:592>
vmlinux   decode_choice(base=0xFFFFFFD0080370F0, level=23843636) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux   decode_seq(f=0xFFFFFFD0080371A8, level=134443500) + 812
<net/netfilter/nf_conntrack_h323_asn1.c:576>
vmlinux   decode_choice(base=0xFFFFFFD008037280, level=0) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux   DecodeRasMessage() + 304
<net/netfilter/nf_conntrack_h323_asn1.c:833>
vmlinux   ras_help() + 684
<net/netfilter/nf_conntrack_h323_main.c:1728>
vmlinux   nf_confirm() + 188
<net/netfilter/nf_conntrack_proto.c:137>

Due to abnormal data in skb->data, the extension bitmap length
exceeds 32 when decoding ras message then uses the length to make
a shift operation. It will change into negative after several loop.
UBSAN load could detect a negative shift as an undefined behaviour
and reports exception.
So we add the protection to avoid the length exceeding 32. Or else
it will return out of range error and stop decoding.

Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
Signed-off-by: Lena Wang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout

While the rhashtable set gc runs asynchronously, a race allows it to
collect elements from anonymous sets with timeouts while it is being
released from the commit path.

Mingi Cho originally reported this issue in a different path in 6.1.x
with a pipapo set with low timeouts which is not possible upstream since
7395dfacfff6 ("netfilter: nf_tables: use timestamp to check for set
element timeout").

Fix this by setting on the dead flag for anonymous sets to skip async gc
in this case.

According to 08e4c8c5919f ("netfilter: nf_tables: mark newset as dead on
transaction abort"), Florian plans to accelerate abort path by releasing
objects via workqueue, therefore, this sets on the dead flag for abort
path too.

Cc: [email protected]
Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
Reported-by: Mingi Cho <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nft_ct: fix l3num expectations with inet pseudo family

Following is rejected but should be allowed:

table inet t {
        ct expectation exp1 {
                [..]
                l3proto ip

Valid combos are:
table ip t, l3proto ip
table ip6 t, l3proto ip6
table inet t, l3proto ip OR l3proto ip6

Disallow inet pseudeo family, the l3num must be a on-wire protocol known
to conntrack.

Retain NFPROTO_INET case to make it clear its rejected
intentionally rather as oversight.

Fixes: 8059918a1377 ("netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: reject constant set with timeout

This set combination is weird: it allows for elements to be
added/deleted, but once bound to the rule it cannot be updated anymore.
Eventually, all elements expire, leading to an empty set which cannot
be updated anymore. Reject this flags combination.

Cc: [email protected]
Fixes: 761da2935d6e ("netfilter: nf_tables: add set timeout API support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>

netfilter: nf_tables: disallow anonymous set with timeout flag

Anonymous sets are never used with timeout from userspace, reject this.
Exception to this rule is NFT_SET_EVAL to ensure legacy meters still work.

Cc: [email protected]
Fixes: 761da2935d6e ("netfilter: nf_tables: add set timeout API support")
Reported-by: lonial con <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>

drm/amdgpu/pm: Fix the error of pwm1_enable setting

Fix the pwm_mode value error which used for
pwm1_enable setting

Signed-off-by: Ma Jun <[email protected]>
Reviewed-by: Lijo Lazar <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: handle range offsets in VRR ranges

Need to check the offset bits for values greater than 255.

v2: also update amdgpu_dm_connector values.

Suggested-by: Mano Ségransan <[email protected]>
Tested-by: Mano Ségransan <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3203
Reviewed-by: Harry Wentland <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]

drm/amd/display: check dc_link before dereferencing

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6683 amdgpu_dm_connector_funcs_force()
warn: variable dereferenced before check 'dc_link' (see line 6663)

Fixes: 967176179215 ("drm/amd/display: fix null-pointer dereference on edid reading")
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Melissa Wen <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>

drm/amd/swsmu: modify the gfx activity scaling

Add an if condition for gfx activity because the scaling has been changed after smu fw version 5d4600.
And remove a warning log.

Signed-off-by: Li Ma <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected] # 6.7.x

tracing: Limit trace_marker writes to just 4K

Limit the max print event of trace_marker to just 4K string size. This must
also be less than the amount that can be held by a trace_seq along with
the text that is before the output (like the task name, PID, CPU, state,
etc). As trace_seq is made to handle large events (some greater than 4K).
Make the max size of a trace_marker write event be 4K which is guaranteed
to fit in the trace_seq buffer.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Suggested-by: Linus Torvalds <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE

The trace_seq buffer is used to print out entire events. It's typically
set to PAGE_SIZE * 2 as there's some events that can be quite large.

As a side effect, writes to trace_marker is limited by both the size of the
trace_seq buffer as well as the ring buffer's sub-buffer size (which is a
power of PAGE_SIZE). By limiting the trace_seq size, it also limits the
size of the largest string written to trace_marker.

trace_seq does not need to be dependent on PAGE_SIZE like the ring buffer
sub-buffers need to be. Hard code it to 8K which is PAGE_SIZE * 2 on most
architectures. This will also limit the size of trace_marker on those
architectures with greater than 4K PAGE_SIZE.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Sachin Sant <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

tracing: Remove precision vsnprintf() check from print event

This reverts 60be76eeabb3d ("tracing: Add size check when printing
trace_marker output"). The only reason the precision check was added
was because of a bug that miscalculated the write size of the string into
the ring buffer and it truncated it removing the terminating nul byte. On
reading the trace it crashed the kernel. But this was due to the bug in
the code that happened during development and should never happen in
practice. If anything, the precision can hide bugs where the string in the
ring buffer isn't nul terminated and it will not be checked.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Linus Torvalds <[email protected]>
Fixes: 60be76eeabb3d ("tracing: Add size check when printing trace_marker output")
Reported-by: Sachin Sant <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

Merge tag 'md-6.9-20240306' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block

Pull MD atomic queue limits changes from Song.

* tag 'md-6.9-20240306' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  block: remove disk_stack_limits
  md: remove mddev->queue
  md: don't initialize queue limits
  md/raid10: use the atomic queue limit update APIs
  md/raid5: use the atomic queue limit update APIs
  md/raid1: use the atomic queue limit update APIs
  md/raid0: use the atomic queue limit update APIs
  md: add queue limit helpers
  md: add a mddev_is_dm helper
  md: add a mddev_add_trace_msg helper
  md: add a mddev_trace_remap helper

spi: cs42l43: Don't limit native CS to the first chip select

As the chip selects can be configured through ACPI/OF/swnode, and
the set_cs() callback will only be called when a native chip select
is being used, there is no reason for the driver to only support the
native chip select as the first chip select. Remove the check that
introduces this limitation.

Fixes: ef75e767167a ("spi: cs42l43: Add SPI controller support")
Signed-off-by: Charles Keepax <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll

Use source instead of ret, which seems to be unrelated and will always
be zero.

Signed-off-by: Stuart Henderson <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode

Signed-off-by: Stuart Henderson <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC

Signed-off-by: Stuart Henderson <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>

block: remove disk_stack_limits

disk_stack_limits is unused now, remove it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md: remove mddev->queue

Just use the request_queue from the gendisk pointer in the relatively
few places that sill need it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md: don't initialize queue limits

Initial queue limits are now set from ->run. Remove the superfluous
initialization in md_alloc and level_store.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md/raid10: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into separate helpers.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md/raid5: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into separate helpers.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md/raid1: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into a separate helper function.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md/raid0: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into a separate helper function.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md: add queue limit helpers

Add a few helpers that wrap the block queue limits API for use in MD.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md: add a mddev_is_dm helper

Add a helper to check for a DM-mapped MD device instead of using
the obfuscated ->gendisk or ->queue NULL checks.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md: add a mddev_add_trace_msg helper

Add a small wrapper around blk_add_trace_msg that hides some argument
dereferences and the check for a DM-mapped MD device.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

md: add a mddev_trace_remap helper

Add a helper to trace bio remapping that hides some argument
dereferences and the check for a DM-mapped MD device.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed--by: Song Liu <[email protected]>
Tested-by: Song Liu <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

Merge tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Get rid of copy_mc flag in iov_iter which really only makes sense for
   the core dumping code so move it out of the generic iov iter code and
   make it coredump's problem. See the detailed commit description.

- Revert fs/aio: Make io_cancel() generate completions again

   The initial fix here was predicated on the assumption that calling
   ki_cancel() didn't complete aio requests. However, that turned out to
   be wrong since the two drivers that actually make use of this set a
   cancellation function that performs the cancellation correctly. So
   revert this change.

- Ensure that the test for IOCB_AIO_RW always happens before the read
   from ki_ctx.

* tag 'vfs-6.8-release.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  iov_iter: get rid of 'copy_mc' flag
  fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
  Revert "fs/aio: Make io_cancel() generate completions again"

Merge tag 'arm-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
"These should be the final fixes for the soc tree for 6.8, as usual
  they mostly deal wtih dts files:

   - Qualcomm fixes for pcie4 on sc8280xp, a revert of msm8996 mpm
     support, sm6115 interconnect and sm8650 gpio.

   - Two fixes for Tegra234 ethernet

   - A Makefile fix to actually build the allwinner based orange pi zero
     2w device tree

   - Fixes for clocks and reset on imx8mp and a DSI display regression
     on imx7.

  The non-DT fixes are:

   - Firmware fixes addressing a kernel panic in op-tee and a minor
     regression in microchip/riscv.

   - A defconfig change to bring back backlight support after a Kconfig
     change"

* tag 'arm-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  firmware: microchip: Fix over-requested allocation size
  tee: optee: Fix kernel panic caused by incorrect error handling
  Revert "arm64: dts: qcom: msm8996: Hook up MPM"
  arm64: dts: qcom: sc8280xp-x13s: limit pcie4 link speed
  arm64: dts: qcom: sc8280xp-crd: limit pcie4 link speed
  arm64: dts: imx8mp: Fix LDB clocks property
  arm64: dts: imx8mp: Fix TC9595 reset GPIO on DH i.MX8M Plus DHCOM SoM
  MAINTAINERS: Use a proper mailinglist for NXP i.MX development
  ARM: dts: imx7: remove DSI port endpoints
  arm64: dts: allwinner: h616: Add Orange Pi Zero 2W to Makefile
  ARM: imx_v6_v7_defconfig: Restore CONFIG_BACKLIGHT_CLASS_DEVICE
  arm64: tegra: Fix Tegra234 MGBE power-domains
  arm64: tegra: Set the correct PHY mode for MGBE
  arm64: dts: qcom: sm6115: Fix missing interconnect-names
  arm64: dts: qcom: sm8650-mtp: add gpio74 as reserved gpio
  arm64: dts: qcom: sm8650-qrd: add gpio74 as reserved gpio

Merge tag 'v6.8-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"Fix potential use-after-frees in rk3288 and sun8i-ce"

* tag 'v6.8-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: rk3288 - Fix use after free in unprepare
crypto: sun8i-ce - Fix use after free in unprepare

bcache: move calculation of stripe_size and io_opt into bcache_device_init

bcache currently calculates the stripe size for the non-cached_dev
case directly in bcache_device_init, but for the cached_dev case it does
it in the caller. Consolidate it in one places, which also enables
setting the io_opt queue_limit before allocating the gendisk so that it
can be passed in instead of changing the limit just after the allocation.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Coly Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

virtio_blk: Do not use disk_set_max_open/active_zones()

In virtblk_read_zoned_limits(), setting a zoned block device maximum
number of open and active zones using the functions
disk_set_max_open_zones() and disk_set_max_active_zones() is incorrect
as setting the limits for the request queue is now done atomically when
the gendisk is created (with blk_mq_alloc_disk()). The value set by the
disk_set_max_open/active_zones() functions will be overwritten.
Fix this by setting the maximum number of open and active zones directly
in the queue_limits structure passed to virtblk_read_zoned_limits().

Fixes: 8b837256560c ("virtio_blk: pass queue_limits to blk_mq_alloc_disk")
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts

This patch is against CVE-2023-6270. The description of cve is:

  A flaw was found in the ATA over Ethernet (AoE) driver in the Linux
  kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on
  `struct net_device`, and a use-after-free can be triggered by racing
  between the free on the struct and the access through the `skbtxq`
  global queue. This could lead to a denial of service condition or
  potential code execution.

In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial
code is finished. But the net_device ifp will still be used in
later tx()->dev_queue_xmit() in kthread. Which means that the
dev_put(ifp) should NOT be called in the success path of skb
initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into
use-after-free because the net_device is freed.

This patch removed the dev_put(ifp) in the success path in
aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx().

Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: 7562f876cd93 ("[NET]: Rework dev_base via list_head (v3)")
Signed-off-by: Chun-Yi Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: move capacity validation to blkpg_do_ioctl()

Commit 6d4e80db4ebe ("block: add capacity validation in
bdev_add_partition()") add check of partition's start and end sectors to
prevent exceeding the size of the disk when adding partitions. However,
there is still no check for resizing partitions now.
Move the check to blkpg_do_ioctl() to cover resizing partitions.

Signed-off-by: Li Lingfeng <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: prevent division by zero in blk_rq_stat_sum()

The expression dst->nr_samples + src->nr_samples may
have zero value on overflow. It is necessary to add
a check to avoid division by zero.

Found by Linux Verification Center (linuxtesting.org) with Svace.

Signed-off-by: Roman Smirnov <[email protected]>
Reviewed-by: Sergey Shtylyov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drbd: atomically update queue limits in drbd_reconsider_queue_parameters

Switch drbd_reconsider_queue_parameters to set up the queue parameters
in an on-stack queue_limits structure and apply the atomically. Remove
various helpers that have become so trivial that they can be folded into
drbd_reconsider_queue_parameters.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drbd: split out a drbd_discard_supported helper

Add a helper to check if discard is supported for a given connection /
backing device combination.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Philipp Reisner <[email protected]>
Reviewed-by: Lars Ellenberg <[email protected]>
Tested-by: Christoph Böhmwalder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drbd: don't set max_write_zeroes_sectors in decide_on_discard_support

fixup_write_zeroes always overrides the max_write_zeroes_sectors value
a little further down the callchain, so don't bother to setup a limit
in decide_on_discard_support.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Philipp Reisner <[email protected]>
Reviewed-by: Lars Ellenberg <[email protected]>
Tested-by: Christoph Böhmwalder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters

drbd_setup_queue_param is only called by drbd_reconsider_queue_parameters
and there is no really clear boundary of responsibilities between the
two.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Philipp Reisner <[email protected]>
Reviewed-by: Lars Ellenberg <[email protected]>
Tested-by: Christoph Böhmwalder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drbd: refactor the backing dev max_segments calculation

Factor out a drbd_backing_dev_max_segments helper that checks the
backing device limitation.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Philipp Reisner <[email protected]>
Reviewed-by: Lars Ellenberg <[email protected]>
Tested-by: Christoph Böhmwalder <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drbd: refactor drbd_reconsider_queue_parameters

Split out a drbd_max_peer_bio_size helper for the peer I/O size,
and condense the various checks to a nested min3(..., max())) instead
of using a lot of local variables.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

drbd: pass the max_hw_sectors limit to blk_alloc_disk

Pass a queue_limits structure with the max_hw_sectors limit to
blk_alloc_disk instead of updating the limit on the allocated gendisk.

Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

sed-opal: Remove the ret variable from the function

The ret variable in the function has not yet been effective and can be
removed.

Signed-off-by: Li kunyu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

sed-opal: Remove unnecessary ‘0’ values from ret

ret is assigned first, so it does not need to initialize the assignment.

Signed-off-by: Li kunyu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

sed-opal: Remove unnecessary ‘0’ values from err

err is assigned first, so it does not need to initialize the assignment.

Signed-off-by: Li zeming <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

sed-opal: Remove unnecessary ‘0’ values from error

error is assigned first, so it does not need to initialize the assignment.

Signed-off-by: Li zeming <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: make block_class constant

Since commit 43a7206b0963 ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the block_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Cc: Greg Kroah-Hartman <[email protected]>
Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ricardo B. Marliere <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Merge tag 'md-6.9-20240305' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block

Pull MD fixes from Song:

"This set fixes two issues:

1. dmraid regression since 6.7 kernels. This issue was initially
    reported in [1]. This set of fix has been reviewed and tested by
    md and dm folks.

2. raid5 hang since 6.7 kernel, reported in [2]. We haven't got a
    better fix for this issue yet. This revert is a workaround. It has
    been applied to 6.7 stable kernels [3], and proved to be affective.
    We will look more into this issue for a better fix.

[1] https://lore.kernel.org/linux-raid/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/
[2] https://lore.kernel.org/linux-raid/20240123005700 [email protected]/
[3] 87165c64fe1a in linux-6.7.y branch."

* tag 'md-6.9-20240305' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  dm-raid: fix lockdep waring in "pers->hot_add_disk"
  dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape
  dm-raid: add a new helper prepare_suspend() in md_personality
  md/dm-raid: don't call md_reap_sync_thread() directly
  dm-raid: really frozen sync_thread during suspend
  md: add a new helper reshape_interrupted()
  md: export helper md_is_rdwr()
  md: export helpers to stop sync_thread
  md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
  Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""

dasd: use the atomic queue limits API

Pass the constant limits directly to blk_mq_alloc_disk, set the nonrot
flag there as well, and then use the commit API to change the transfer
size and logical block size dependent values.

This relies on the assumption that no I/O can be pending before the
devices moves into the ready state and doesn't need extra freezing
for changes to the queue limits.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

dasd: move queue setup to common code

Most of the code in setup_blk_queue is shared between all disciplines.
Move it to common code and leave a method to query the maximum number
of transferable blocks, and a flag to indicate discard support.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

dasd: cleamup dasd_state_basic_to_ready

Reflow dasd_state_basic_to_ready a bit to make it easier to modify.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Stefan Haberland <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block: Fix page refcounts for unaligned buffers in __bio_release_pages()

Fix an incorrect number of pages being released for buffers that do not
start at the beginning of a page.

Fixes: 1b151e2435fc ("block: Remove special-casing of compound pages")
Cc: [email protected]
Signed-off-by: Tony Battersby <[email protected]>
Tested-by: Greg Edwards <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Revert "drm/udl: Add ARGB8888 as a format"

This reverts commit 95bf25bb9ed5dedb7fb39f76489f7d6843ab0475.

Apparently there was a previous discussion about emulation of formats
and it was decided XRGB8888 was the only format to support for legacy
userspace [1]. Remove ARGB8888. Userspace needs to be fixed to accept
XRGB8888.

[1] https://lore.kernel.org/r/60dc7697-d7a0-4bf4-a22e-32f1bbb792c2@suse.de

Acked-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306063721.1.I4a32475190334e1fa4eef4700ecd2787a43c94b5@changeid

phy: qcom-qmp-combo: fix type-c switch registration

Due to a long-standing issue in driver core, drivers may not probe defer
after having registered child devices to avoid triggering a probe
deferral loop (see fbc35b45f9f6 ("Add documentation on meaning of
-EPROBE_DEFER")).

Move registration of the typec switch to after looking up clocks and
other resources.

Note that PHY creation can in theory also trigger a probe deferral when
a 'phy' supply is used. This does not seem to affect the QMP PHY driver
but the PHY subsystem should be reworked to address this (i.e. by
separating initialisation and registration of the PHY).

Fixes: 2851117f8f42 ("phy: qcom-qmp-combo: Introduce orientation switching")
Cc: [email protected] # 6.5
Cc: Bjorn Andersson <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Acked-by: Vinod Koul <[email protected]>
Acked-by: Neil Armstrong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>

phy: qcom-qmp-combo: fix drm bridge registration

Due to a long-standing issue in driver core, drivers may not probe defer
after having registered child devices to avoid triggering a probe
deferral loop (see fbc35b45f9f6 ("Add documentation on meaning of
-EPROBE_DEFER")).

This could potentially also trigger a bug in the DRM bridge
implementation which does not expect bridges to go away even if device
links may avoid triggering this (when enabled).

Move registration of the DRM aux bridge to after looking up clocks and
other resources.

Note that PHY creation can in theory also trigger a probe deferral when
a 'phy' supply is used. This does not seem to affect the QMP PHY driver
but the PHY subsystem should be reworked to address this (i.e. by
separating initialisation and registration of the PHY).

Fixes: 35921910bbd0 ("phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE")
Fixes: 1904c3f578dc ("phy: qcom-qmp-combo: Introduce drm_bridge")
Cc: [email protected] # 6.5
Cc: Bjorn Andersson <[email protected]>
Cc: Dmitry Baryshkov <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Reviewed-by: Neil Armstrong <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Acked-by: Vinod Koul <[email protected]>
Acked-by: Neil Armstrong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>

nvme: clear caller pointer on identify failure

The memory allocated for the identification is freed on failure. Set
it to NULL so the caller doesn't have a pointer to that freed address.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

nvme: host: fix double-free of struct nvme_id_ns in ns_update_nuse()

When nvme_identify_ns() fails, it frees the pointer to the struct
nvme_id_ns before it returns. However, ns_update_nuse() calls kfree()
for the pointer even when nvme_identify_ns() fails. This results in
KASAN double-free, which was observed with blktests nvme/045 with
proposed patches [1] on the kernel v6.8-rc7. Fix the double-free by
skipping kfree() when nvme_identify_ns() fails.

Link: https://lore.kernel.org/linux-block/[email protected]/
Fixes: a1a825ab6a60 ("nvme: add csi, ms and nuse to sysfs")
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Daniel Wagner <[email protected]>
Reviewed-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Keith Busch <[email protected]>

timer/migration: Fix quick check reporting late expiry

When a CPU is the last active in the hierarchy and it tries to enter
into idle, the quick check looking up the next event towards cpuidle
heuristics may report a too late expiry, such as in the following
scenario:

                        [GRP1:0]
                     migrator = NONE
                     active   = NONE
                     nextevt  = T0:0, T0:1
                     /              \
          [GRP0:0]                  [GRP0:1]
       migrator = NONE           migrator = NONE
       active   = NONE           active   = NONE
       nextevt  = T0, T1         nextevt  = T2
       /         \                /         \
      0           1              2           3
    idle       idle           idle         idle

0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has
a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The
expire order is T0 < T1 < T2.

                        [GRP1:0]
                     migrator = GRP0:0
                     active   = GRP0:0
                     nextevt  = T0:0(i), T0:1
                   /              \
          [GRP0:0]                  [GRP0:1]
       migrator = CPU0           migrator = NONE
       active   = CPU0           active   = NONE
       nextevt  = T0(i), T1      nextevt  = T2
       /         \                /         \
      0           1              2           3
    active       idle           idle         idle

1) CPU 0 becomes active. The (i) means a now ignored timer.

                        [GRP1:0]
                     migrator = GRP0:0
                     active   = GRP0:0
                     nextevt  = T0:1
                     /              \
          [GRP0:0]                  [GRP0:1]
       migrator = CPU0           migrator = NONE
       active   = CPU0           active   = NONE
       nextevt  = T1             nextevt  = T2
       /         \                /         \
      0           1              2           3
    active       idle           idle         idle

2) CPU 0 handles remote. No timer actually expired but ignored timers
   have been cleaned out and their sibling's timers haven't been
   propagated. As a result the top level's next event is T2 and not T1.

3) CPU 0 tries to enter idle without any global timer enqueued and calls
   tmigr_quick_check(). The expiry of T2 is returned instead of the
   expiry of T1.

When the quick check returns an expiry that is too late, the cpuidle
governor may pick up a C-state that is too deep. This may be result into
undesired CPU wake up latency if the next timer is actually close enough.

Fix this with assuming that expiries aren't sorted top-down while
performing the quick check. Pick up instead the earliest encountered one
while walking up the hierarchy.

7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

drm/i915/panelreplay: Move out psr_init_dpcd() from init_connector()

Move psr_init_dpcd() from init-connector to connector-detect
function. The dpcd probe for checking panel replay capability
for external dp connector is causing delay during boot which can
be optimized by moving dpcd probe to connector specific detect().

v1: Initial version.
v2: Add details in commit description. [Jani]

Suggested-by: Ville Syrjälä <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10284
Signed-off-by: Animesh Manna <[email protected]>
Fixes: cceeaa312d39 ("drm/i915/panelreplay: Enable panel replay dpcd initialization for DP")
Reviewed-by: Jani Nikula <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 1cca19bf296fae0636a637b48d195ac6b4d430c9)
Signed-off-by: Joonas Lahtinen <[email protected]>

x86/topology: Ignore non-present APIC IDs in a present package

Borislav reported that one of his systems has a broken MADT table which
advertises eight present APICs and 24 non-present APICs in the same
package.

The non-present ones are considered hot-pluggable by the topology
evaluation code, which is obviously bogus as there is no way to hot-plug
within the same package.

As the topology evaluation code accounts for hot-pluggable CPUs in a
package, the maximum number of cores per package is computed wrong, which
in turn causes the uncore performance counter driver to access non-existing
MSRs. It will probably confuse other entities which rely on the maximum
number of cores and threads per package too.

Cure this by ignoring hot-pluggable APIC IDs within a present package.

In theory it would be reasonable to just do this unconditionally, but then
there is this thing called reality^Wvirtualization which ruins
everything. Virtualization is the only existing user of "physical" hotplug
and the virtualization tools allow the above scenario. Whether that is
actually in use or not is unknown.

As it can be argued that the virtualization case is not affected by the
issues which exposed the reported problem, allow the bogosity if the kernel
determined that it is running in a VM for now.

Fixes: 89b0f15f408f ("x86/cpu/topology: Get rid of cpuinfo::x86_max_cores")
Reported-by: Borislav Petkov (AMD) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/87a5nbvccx.ffs@tglx

firewire: ohci: prevent leak of left-over IRQ on unbind

Commit 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
also removed the call to free_irq() in pci_remove(), leading to a
leftover irq of devm_request_irq() at pci_disable_msi() in pci_remove()
when unbinding the driver from the device

remove_proc_entry: removing non-empty directory 'irq/136', leaking at
least 'firewire_ohci'
Call Trace:
? remove_proc_entry+0x19c/0x1c0
? __warn+0x81/0x130
? remove_proc_entry+0x19c/0x1c0
? report_bug+0x171/0x1a0
? console_unlock+0x78/0x120
? handle_bug+0x3c/0x80
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? remove_proc_entry+0x19c/0x1c0
unregister_irq_proc+0xf4/0x120
free_desc+0x3d/0xe0
? kfree+0x29f/0x2f0
irq_free_descs+0x47/0x70
msi_domain_free_locked.part.0+0x19d/0x1d0
msi_domain_free_irqs_all_locked+0x81/0xc0
pci_free_msi_irqs+0x12/0x40
pci_disable_msi+0x4c/0x60
pci_remove+0x9d/0xc0 [firewire_ohci
01b483699bebf9cb07a3d69df0aa2bee71db1b26]
pci_device_remove+0x37/0xa0
device_release_driver_internal+0x19f/0x200
unbind_store+0xa1/0xb0

remove irq with devm_free_irq() before pci_disable_msi()
also remove it in fail_msi: of pci_probe() as this would lead to
an identical leak

Cc: [email protected]
Fixes: 5a95f1ded28691e6 ("firewire: ohci: use devres for requested IRQ")
Signed-off-by: Edmund Raile <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Sakamoto <[email protected]>

drm/i915/dp: Fix connector DSC HW state readout

The DSC HW state of DP connectors is read out during driver loading and
system resume in intel_modeset_update_connector_atomic_state(). This
function is called for all connectors though and so the state of DSI
connectors will also get updated incorrectly, triggering a WARN there
wrt. the DSC decompression AUX device.

Fix the above by moving the DSC state readout to a new DP connector
specific sync_state() hook. This is anyway the logical place to update
the connector object's state vs. the connector's atomic state.

Fixes: b2608c6b3212 ("drm/i915/dp_mst: Enable MST DSC decompression for all streams")
Reported-and-tested-by: Drew Davenport <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Reviewed-by: Ankit Nautiyal <[email protected]>
Signed-off-by: Imre Deak <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a62e145981500996ea76af3d740ce0c0d74c5be0)
Signed-off-by: Joonas Lahtinen <[email protected]>

drm/i915/selftests: Fix dependency of some timeouts on HZ

Third argument of i915_request_wait() accepts a timeout value in jiffies.
Most users pass either a simple HZ based expression, or a result of
msecs_to_jiffies(), or MAX_SCHEDULE_TIMEOUT, or a very small number not
exceeding 4 if applicable as that value. However, there is one user --
intel_selftest_wait_for_rq() -- that passes a WAIT_FOR_RESET_TIME symbol,
defined as a large constant value that most probably represents a desired
timeout in ms. While that usage results in the intended value of timeout
on usual x86_64 kernel configurations, it is not portable across different
architectures and custom kernel configs.

Rename the symbol to clearly indicate intended units and convert it to
jiffies before use.

Fixes: 3a4bfa091c46 ("drm/i915/selftest: Fix workarounds selftest for GuC submission")
Signed-off-by: Janusz Krzysztofik <[email protected]>
Cc: Rahul Kumar Singh <[email protected]>
Cc: John Harrison <[email protected]>
Cc: Matthew Brost <[email protected]>
Reviewed-by: Andi Shyti <[email protected]>
Signed-off-by: Andi Shyti <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 6ee3f54b880c91ab2e244eb4ffd4bfed37832b25)
Signed-off-by: Joonas Lahtinen <[email protected]>